openmolecules.org Forum: Functionality » Can DW handle big data?

Home » DataWarrior » Functionality » Can DW handle big data? (multithreading, low cost of memory)

Show: Today's Messages :: Polls :: Message Navigator

Can DW handle big data? [message #682]

Tue, 22 October 2019 08:00

greatzdl
Messages: 1
Registered: March 2019

Junior Member

Dear Thomas,
I have used datawarrior for many years since it was released as open source program. Thank you very much for your contibution.

During these years, I found that it is hard to open data with rows larger than 3M, especially with structure columns. Do you have any solutions to solve this problem? DO you have plans to use multithreading technology to open large data file?

Hope to get your relpy.

Best wishes

DaRong

Report message to a moderator

Re: Can DW handle big data? [message #688 is a reply to message #682]

Thu, 24 October 2019 20:59

thomas
Messages: 736
Registered: June 2014

Senior Member

Dear DaRong,

3 million rows is already a lot. I recommend for very large files to use DataWarrior on Linux, because you can easily increase the memory maximum that there is at least no memory problem. DataWarrior uses multithreading for most functions, which benefit from it. However, reading a file is a serial process and cannot easily be parallelized. Possibly I could gain some performance, when distributing the data analysis after file loading on multiple cores. I will put it on the agenda, but not before the next release, which I anticipate before the end of the year.

Thanks and best wishes,

Thomas

Report message to a moderator

Re: Can DW handle big data? [message #694 is a reply to message #688]

Sun, 27 October 2019 17:01

nbehrnd
Messages: 234
Registered: June 2019

Senior Member

In addition, DaRong, if working in Linux and facing limitation by the RAM accessible
on your computer, you may supplement «working memory» with a swap partition. While
it won't be as performant in terms of read-write access speed, especially if it is
on a HDD platter, as a true RAM brick, this offers a noticable benefit quickly setup
(e.g. using an Ubuntu session on AWS).

A possible primer may be
https://linuxize.com/post/how-to-add-swap-space-on-ubuntu-18 -04/

Norwid

Report message to a moderator

Previous Topic:	Synchronise Colours
Next Topic:	Exclude function in the structure filter does not work with 2 exclude groups

Goto Forum:

-=] Back to Top [=-

[ Syndicate this forum (XML) ] [

] [

]

Current Time: Sat Nov 01 00:20:19 CET 2025

Total time taken to generate the page: 0.77333 seconds