DataWarrior Features
General
- Interactive data visualization and analysis
- Built-in chemical intelligence
- Realtime data filtering on alphanumerical and chemical criteria
- Prediction of molecular properties from the chemical structure
- Dedicated cheminformatics modules support drug discovery
- Installation contains user manual and many example files
- Runs on Linux, Macintosh (with retina support) and Windows
- Computationally demanding algorithms use all processor cores
Files
- Reads and writes its own native text-based file formats
- Imports TAB delimited txt, csv, sdf (version 2 & 3), interprets SMILES codes
- Imports from clipboard content
- Exports TAB delimited txt, sdf (version 2 & 3)
- Flexible file merge and append options
Views
- Table view with columns containing alphanumerical or chemical information
- Versatile graphical 2D-view for scatter plots, bar & pie charts, box plots, ...
- Graphical freely rotatable 3D-view for scatter plots & bar charts
- Dedicated chemical structure view with optional alphanumerical data
- Form based view with form designer and form based data editing
- Multiple views are shown side by side or are stacked on top of each other
- Views can be highly customized to reveal multiple dimensions of the data
Filter Types
- Text filters with support for regular expressions
- Data range sliders for numerical and date columns
- Category filters with individually selectable categories
- Category browser to manually or automatically switch categories
- Substructure filter with flexiple query features and real-time filtering
- Filtering on various shades of compound similarity
- Special filters screen against lists of compounds or substructures
- Reaction filtering by similarity, reaction sub-structure, and retrons
- Filter animations allow for dynamic graphical views
Data Analysis
- Data pivoting and reverse pivoting
- Calculation of new column from custom expression
- Principal Component Analysis
- Self Organizing Maps
- T-distributed stochastic neighbor embedding (t-SNE)
- Uniform manifold approximation and projection (UMAP)
- Calculation and display of statistical parameters
- Creation and manipulation of persistent row lists for many purposes
Cheminformatics
- Fast substructure & compound similarity filtering (see descriptors)
- Calculation of physico-chemical properties like MW, logP, logS, tPSA
- Calculation of druglikeness, flexibility, complexity, atom/ring counts, etc.
- Detection of toxicity risk factor for four toxicity categories
- Enumeration of combinatorial libraries with predefined or custom design
- De-novo structure creation using evolutionary algorithm with flexible fitness criteria
- Principal component analysis and self organizing maps on chemical descriptors
- 2-dimensional scaling algorithm using chemical and pharmacophore similarities
- Automatic and semi-automatic creation of structure-activity-relationship tables
- Scaffold analysis (ring systems or Murcko scaffolds)
- Search & Replace functionality on chemical structure columns
- Comparison of two structure files to reveal overlap of similar structures
- 2D atom coordinate generation with unified scaffold orientation
- Activity cliff analysis
- Generation of drug-like or natural-product like random molecules
- Diverse subset selection and compound clustering
- Consistently uses MDL's concept of Enhanced Stereo Recognition
- Generation of conformers with MMFF94 energy minimization
- Conformation explorer with raytracer for photo-realistic molecule images
- Comprehensive support for chemical reactions, reads Biovia databases & reaction SMILES
- Machine learning using chemical descriptors: Applicability check & missing value prediction
- PheSA superpositioning of conformers (PHarmacophore Enhanced Shape Alignment)
- Protein-ligand docking with pose scoring and interactive visualization
Descriptors
- FragFp: fragment dictionary based binary fingerprint (analog MDL keys)
- PathFp: linear atom strands normalized, hashed, binary (analog Daylight)
- SphereFp: canonical circular fragments, hashed, binary
- SkelSpheres: canonical circular fragments & skeletons, stereo perception, hashed, counts
- OrgFunctions: synthetically accessible organic functionality in similarity tree
- Flexophore: pharmacophore similarity considering diverse conformers and PDB statistics
- RxnFp: reaction similarity, reaction center similarity, reaction periphery similarity
Databases
- All chemical structures in Wikipedia can be downloaded and searched locally.
- Fast structure and target search in ChEMBL database with result retrieval.
- Structure, price and package size search in Enamine building block database.
- Substructure/similarity/Author/Year search in Crystallography Open Database (COD) and retrieval of 3D-crystal structures.
- Direct access to Oracle, PostgreSQL, MySQL, SQL-Server using custom SQL queries.
- Customized search and retrieval from any database using self developed plugins.
Automation
- (Almost) any sequence of tasks can be recorded as macro.
- Macros can be created or edited interactively without scripting.
- Macros allow to share or repeat complex tasks on updated or different data.