HAPP : high-accuracy pipeline for processing deep metabarcoding data

2025
journal article
article
dc.abstract.enDeep metabarcoding offers an efficient and reproducible approach to biodiversity monitoring, but noisy data and incomplete reference databases challenge accurate diversity estimation and taxonomic annotation. Here, we introduce a novel algorithm, NEEAT, for removing spurious operational taxonomic units (OTUs) originating from nuclear-embedded mitochondrial DNA sequences (NUMTs) or sequencing errors. It integrates ‘echo’ signals across samples with the identification of unusual evolutionary patterns among similar DNA sequences. We also extensively benchmark current tools for chimera removal, taxonomic annotation and OTU clustering of deep metabarcoding data. The best performing tools/parameter settings are integrated into HAPP, a high-accuracy pipeline for processing deep metabarcoding data. Tests using CO1 data from BOLD and large-scale metabarcoding data on insects demonstrate that HAPP significantly outperforms existing methods, while enabling efficient analysis of extensive datasets by parallelizing computations across taxonomic groups.
dc.affiliationWydział Biologii : Instytut Nauk o Środowisku
dc.contributor.authorSundh, John
dc.contributor.authorGranqvist, Emma
dc.contributor.authorIwaszkiewicz-Eggebrecht, Ela
dc.contributor.authorManoharan, Lokeshwaran
dc.contributor.authorvan Dijk, Laura J. A.
dc.contributor.authorGoodsell, Robert
dc.contributor.authorGodeiro, Nerivania N.
dc.contributor.authorBellini, Bruno C.
dc.contributor.authorOrsholm, Johanna
dc.contributor.authorŁukasik, Piotr - 398824
dc.contributor.authorMiraldo, Andreia
dc.contributor.authorRoslin, Tomas
dc.contributor.authorTack, Ayco J. M.
dc.contributor.authorAndersson, Anders F.
dc.contributor.authorRonquist, Fredrik
dc.date.accessioned2025-11-21T15:40:51Z
dc.date.available2025-11-21T15:40:51Z
dc.date.createdat2025-11-19T09:38:49Zen
dc.date.issued2025
dc.date.openaccess0
dc.description.accesstimew momencie opublikowania
dc.description.number11
dc.description.versionostateczna wersja wydawcy
dc.description.volume21
dc.identifier.articleide1013558
dc.identifier.doi10.1371/journal.pcbi.1013558
dc.identifier.eissn1553-7358
dc.identifier.issn1553-734X
dc.identifier.projectDRC AI
dc.identifier.urihttps://ruj.uj.edu.pl/handle/item/565862
dc.languageeng
dc.language.containereng
dc.rightsUdzielam licencji. Uznanie autorstwa 4.0 Międzynarodowa
dc.rights.licenceCC-BY
dc.rights.urihttp://creativecommons.org/licenses/by/4.0/legalcode.pl
dc.share.typeotwarte czasopismo
dc.subtypeArticle
dc.titleHAPP : high-accuracy pipeline for processing deep metabarcoding data
dc.title.journalPLoS Computational Biology
dc.typeJournalArticle
dspace.entity.typePublicationen
dc.abstract.en
Deep metabarcoding offers an efficient and reproducible approach to biodiversity monitoring, but noisy data and incomplete reference databases challenge accurate diversity estimation and taxonomic annotation. Here, we introduce a novel algorithm, NEEAT, for removing spurious operational taxonomic units (OTUs) originating from nuclear-embedded mitochondrial DNA sequences (NUMTs) or sequencing errors. It integrates ‘echo’ signals across samples with the identification of unusual evolutionary patterns among similar DNA sequences. We also extensively benchmark current tools for chimera removal, taxonomic annotation and OTU clustering of deep metabarcoding data. The best performing tools/parameter settings are integrated into HAPP, a high-accuracy pipeline for processing deep metabarcoding data. Tests using CO1 data from BOLD and large-scale metabarcoding data on insects demonstrate that HAPP significantly outperforms existing methods, while enabling efficient analysis of extensive datasets by parallelizing computations across taxonomic groups.
dc.affiliation
Wydział Biologii : Instytut Nauk o Środowisku
dc.contributor.author
Sundh, John
dc.contributor.author
Granqvist, Emma
dc.contributor.author
Iwaszkiewicz-Eggebrecht, Ela
dc.contributor.author
Manoharan, Lokeshwaran
dc.contributor.author
van Dijk, Laura J. A.
dc.contributor.author
Goodsell, Robert
dc.contributor.author
Godeiro, Nerivania N.
dc.contributor.author
Bellini, Bruno C.
dc.contributor.author
Orsholm, Johanna
dc.contributor.author
Łukasik, Piotr - 398824
dc.contributor.author
Miraldo, Andreia
dc.contributor.author
Roslin, Tomas
dc.contributor.author
Tack, Ayco J. M.
dc.contributor.author
Andersson, Anders F.
dc.contributor.author
Ronquist, Fredrik
dc.date.accessioned
2025-11-21T15:40:51Z
dc.date.available
2025-11-21T15:40:51Z
dc.date.createdaten
2025-11-19T09:38:49Z
dc.date.issued
2025
dc.date.openaccess
0
dc.description.accesstime
w momencie opublikowania
dc.description.number
11
dc.description.version
ostateczna wersja wydawcy
dc.description.volume
21
dc.identifier.articleid
e1013558
dc.identifier.doi
10.1371/journal.pcbi.1013558
dc.identifier.eissn
1553-7358
dc.identifier.issn
1553-734X
dc.identifier.project
DRC AI
dc.identifier.uri
https://ruj.uj.edu.pl/handle/item/565862
dc.language
eng
dc.language.container
eng
dc.rights
Udzielam licencji. Uznanie autorstwa 4.0 Międzynarodowa
dc.rights.licence
CC-BY
dc.rights.uri
http://creativecommons.org/licenses/by/4.0/legalcode.pl
dc.share.type
otwarte czasopismo
dc.subtype
Article
dc.title
HAPP : high-accuracy pipeline for processing deep metabarcoding data
dc.title.journal
PLoS Computational Biology
dc.type
JournalArticle
dspace.entity.typeen
Publication
Affiliations

* The migration of download and view statistics prior to the date of April 8, 2024 is in progress.

Views
10
Views per month
Downloads
lukasik_et-al_happ_high-accuracy_pipeline_2025.pdf
3