openmolecules.org

 
Home » DataWarrior » Functionality » Can DW handle big data? (multithreading, low cost of memory)
Can DW handle big data? [message #682] Tue, 22 October 2019 08:00 Go to next message
greatzdl is currently offline  greatzdl
Messages: 1
Registered: March 2019
Junior Member
Dear Thomas,
I have used datawarrior for many years since it was released as open source program. Thank you very much for your contibution.

During these years, I found that it is hard to open data with rows larger than 3M, especially with structure columns. Do you have any solutions to solve this problem? DO you have plans to use multithreading technology to open large data file?

Hope to get your relpy.

Best wishes

DaRong
Re: Can DW handle big data? [message #688 is a reply to message #682] Thu, 24 October 2019 20:59 Go to previous messageGo to next message
thomas is currently offline  thomas
Messages: 648
Registered: June 2014
Senior Member
Dear DaRong,

3 million rows is already a lot. I recommend for very large files to use DataWarrior on Linux, because you can easily increase the memory maximum that there is at least no memory problem. DataWarrior uses multithreading for most functions, which benefit from it. However, reading a file is a serial process and cannot easily be parallelized. Possibly I could gain some performance, when distributing the data analysis after file loading on multiple cores. I will put it on the agenda, but not before the next release, which I anticipate before the end of the year.

Thanks and best wishes,

Thomas
Re: Can DW handle big data? [message #694 is a reply to message #688] Sun, 27 October 2019 17:01 Go to previous message
nbehrnd is currently offline  nbehrnd
Messages: 208
Registered: June 2019
Senior Member
In addition, DaRong, if working in Linux and facing limitation by the RAM accessible
on your computer, you may supplement «working memory» with a swap partition. While
it won't be as performant in terms of read-write access speed, especially if it is
on a HDD platter, as a true RAM brick, this offers a noticable benefit quickly setup
(e.g. using an Ubuntu session on AWS).

A possible primer may be
https://linuxize.com/post/how-to-add-swap-space-on-ubuntu-18 -04/

Norwid
Previous Topic: Synchronise Colours
Next Topic: Exclude function in the structure filter does not work with 2 exclude groups
Goto Forum:
  


Current Time: Fri Apr 19 23:12:09 CEST 2024

Total time taken to generate the page: 0.03693 seconds