Adaptive computation modules : granular conditional computation for efficient inference

Wójcik, Bartosz; Devoto, Alessio; Pustelnik, Karol; Minervini, Pasquale; Scardapane, Simone

doi:10.1609/aaai.v39i20.35453

Simple view

Full metadata view

Authors

Statistics

Adaptive computation modules : granular conditional computation for efficient inference

2025

book section

conference proceedings

10.1609/aaai.v39i20.35453

Author

Wójcik Bartosz

Devoto Alessio

Pustelnik Karol

Minervini Pasquale

Scardapane Simone

Editor

Walsh Toby

Shah Julie

Kolter Zico

Volume

39

Book title / Journal title

Proceedings of the 39th AAAI Conference on Artificial Intelligence

Place

Washington

Publisher

AAAI Press

Volume title

AAAI-25 Technical Tracks 20

Pages

21510-21518

ISBN

978-1-57735-897-8

Series

Proceedings of the AAAI Conference on Artificial Intelligence

Serie's ISSN

2159-5399

Serie's eISSN

2374-3468

URL

https://ojs.aaai.org/index.php/AAAI/article/view/35453

Date accessed

2025-05-06

Language

English

Book language / Journal language

English

Abstract in English

While transformer models have been highly successful, they are computationally inefficient. We observe that for each layer, the full width of the layer may be needed only for a small subset of tokens inside a batch and that the "effective" width needed to process a token can vary from layer to layer. Motivated by this observation, we introduce the Adaptive Computation Module (ACM), a generic module that dynamically adapts its computational load to match the estimated difficulty of the input on a per-token basis. An ACM consists of a sequence of learners that progressively refine the output of their preceding counterparts. An additional gating mechanism determines the optimal number of learners to execute for each token. We also propose a distillation technique to replace any pre-trained model with an "ACMized" variant. Our evaluation of transformer models in computer vision and speech recognition demonstrates that substituting layers with ACMs significantly reduces inference costs without degrading the downstream accuracy for a wide interval of user-defined budgets

dc.abstract.en	While transformer models have been highly successful, they are computationally inefficient. We observe that for each layer, the full width of the layer may be needed only for a small subset of tokens inside a batch and that the "effective" width needed to process a token can vary from layer to layer. Motivated by this observation, we introduce the Adaptive Computation Module (ACM), a generic module that dynamically adapts its computational load to match the estimated difficulty of the input on a per-token basis. An ACM consists of a sequence of learners that progressively refine the output of their preceding counterparts. An additional gating mechanism determines the optimal number of learners to execute for each token. We also propose a distillation technique to replace any pre-trained model with an "ACMized" variant. Our evaluation of transformer models in computer vision and speech recognition demonstrates that substituting layers with ACMs significantly reduces inference costs without degrading the downstream accuracy for a wide interval of user-defined budgets
dc.affiliation	Szkoła Doktorska Nauk Ścisłych i Przyrodniczych
dc.conference	39th AAAI Conference on Artificial Intelligence
dc.conference.city	Filadelfia, Pensylwania
dc.conference.country	Stany Zjednoczone
dc.conference.datefinish	2025-03-04
dc.conference.datestart	2025-02-25
dc.conference.series	National Conference of the American Association for Artificial Intelligence
dc.conference.seriesshortcut	AAAI
dc.conference.shortcut	AAAI-25
dc.conference.weblink	https://aaai.org/conference/aaai/aaai-25/
dc.contributor.author	Wójcik, Bartosz - 422840
dc.contributor.author	Devoto, Alessio
dc.contributor.author	Pustelnik, Karol
dc.contributor.author	Minervini, Pasquale
dc.contributor.author	Scardapane, Simone
dc.contributor.editor	Walsh, Toby
dc.contributor.editor	Shah, Julie
dc.contributor.editor	Kolter, Zico
dc.date.accession	2025-05-06
dc.date.accessioned	2025-05-06T09:05:19Z
dc.date.available	2025-05-06T09:05:19Z
dc.date.createdat	2025-04-14T09:14:19Z	en
dc.date.issued	2025
dc.date.openaccess	0
dc.description.accesstime	w momencie opublikowania
dc.description.conftype	international
dc.description.physical	21510-21518
dc.description.series	Proceedings of the AAAI Conference on Artificial Intelligence
dc.description.version	ostateczna wersja wydawcy
dc.description.volume	39
dc.identifier.bookweblink	https://ruj.uj.edu.pl/entities/publication/f3a7e81e-9027-4663-b2ad-4aea425c76c2
dc.identifier.doi	10.1609/aaai.v39i20.35453
dc.identifier.isbn	978-1-57735-897-8
dc.identifier.serieseissn	2374-3468
dc.identifier.seriesissn	2159-5399
dc.identifier.uri	https://ruj.uj.edu.pl/handle/item/552028
dc.identifier.weblink	https://ojs.aaai.org/index.php/AAAI/article/view/35453
dc.language	eng
dc.language.container	eng
dc.place	Washington
dc.publisher	AAAI Press
dc.rights	Dodaję tylko opis bibliograficzny
dc.rights.licence	Inna otwarta licencja
dc.share.type	inne
dc.subtype	ConferenceProceedings
dc.title	Adaptive computation modules : granular conditional computation for efficient inference
dc.title.container	Proceedings of the 39th AAAI Conference on Artificial Intelligence
dc.title.volume	AAAI-25 Technical Tracks 20
dc.type	BookSection
dspace.entity.type	Publication	en

dc.abstract.en

While transformer models have been highly successful, they are computationally inefficient. We observe that for each layer, the full width of the layer may be needed only for a small subset of tokens inside a batch and that the "effective" width needed to process a token can vary from layer to layer. Motivated by this observation, we introduce the Adaptive Computation Module (ACM), a generic module that dynamically adapts its computational load to match the estimated difficulty of the input on a per-token basis. An ACM consists of a sequence of learners that progressively refine the output of their preceding counterparts. An additional gating mechanism determines the optimal number of learners to execute for each token. We also propose a distillation technique to replace any pre-trained model with an "ACMized" variant. Our evaluation of transformer models in computer vision and speech recognition demonstrates that substituting layers with ACMs significantly reduces inference costs without degrading the downstream accuracy for a wide interval of user-defined budgets

dc.affiliation

Szkoła Doktorska Nauk Ścisłych i Przyrodniczych

dc.conference

39th AAAI Conference on Artificial Intelligence

dc.conference.city

Filadelfia, Pensylwania

dc.conference.country

Stany Zjednoczone

dc.conference.datefinish

2025-03-04

dc.conference.datestart

2025-02-25

dc.conference.series

National Conference of the American Association for Artificial Intelligence

dc.conference.seriesshortcut

AAAI

dc.conference.shortcut

AAAI-25

dc.conference.weblink

https://aaai.org/conference/aaai/aaai-25/

dc.contributor.author

Wójcik, Bartosz - 422840

dc.contributor.author

Devoto, Alessio

dc.contributor.author

Pustelnik, Karol

dc.contributor.author

Minervini, Pasquale

dc.contributor.author

Scardapane, Simone

dc.contributor.editor

Walsh, Toby

dc.contributor.editor

Shah, Julie

dc.contributor.editor

Kolter, Zico

dc.date.accession

2025-05-06

dc.date.accessioned

2025-05-06T09:05:19Z

dc.date.available

2025-05-06T09:05:19Z

dc.date.createdaten

2025-04-14T09:14:19Z

dc.date.issued

2025

dc.date.openaccess

0

dc.description.accesstime

w momencie opublikowania

dc.description.conftype

international

dc.description.physical

21510-21518

dc.description.series

Proceedings of the AAAI Conference on Artificial Intelligence

dc.description.version

ostateczna wersja wydawcy

dc.description.volume

39

dc.identifier.bookweblink

https://ruj.uj.edu.pl/entities/publication/f3a7e81e-9027-4663-b2ad-4aea425c76c2

dc.identifier.doi

10.1609/aaai.v39i20.35453

dc.identifier.isbn

978-1-57735-897-8

dc.identifier.serieseissn

2374-3468

dc.identifier.seriesissn

2159-5399

dc.identifier.uri

https://ruj.uj.edu.pl/handle/item/552028

dc.identifier.weblink

https://ojs.aaai.org/index.php/AAAI/article/view/35453

dc.language

eng

dc.language.container

eng

dc.place

Washington

dc.publisher

AAAI Press

dc.rights

Dodaję tylko opis bibliograficzny

dc.rights.licence

Inna otwarta licencja

dc.share.type

inne

dc.subtype

ConferenceProceedings

dc.title

Adaptive computation modules : granular conditional computation for efficient inference

dc.title.container

Proceedings of the 39th AAAI Conference on Artificial Intelligence

dc.title.volume

AAAI-25 Technical Tracks 20

dc.type

BookSection

dspace.entity.typeen

Publication

Affiliations

Szkoła Doktorska Nauk Ścisłych i Przyrodniczych

Wójcik, Bartosz

Wydział Matematyki i Informatyki

Wójcik, Bartosz

No affiliation

Devoto, Alessio

Pustelnik, Karol

Minervini, Pasquale

Scardapane, Simone

Walsh, Toby

Shah, Julie

Kolter, Zico

* The migration of download and view statistics prior to the date of April 8, 2024 is in progress.

Views

38 Views per month

Views per city

Krakow

12

Dublin

1

No access

Collections

Research publications