Advanced search
2 files | 774.91 KB
Author
Organization
Abstract
We show how forward-mode automatic differentiation (AD) can be employed within larger reverse-mode computations to dynamically differentiate broadcast operations in a GPU-friendly manner. Our technique fully exploits the broadcast Jacobian's inherent sparsity structure, and unlike a pure reverse-mode approach, this "mixed-mode" approach does not require a backwards pass over the broadcasted operation's subgraph, obviating the need for several reverse-mode-specific programmability restrictions on user-authored broadcast operations. Most notably, this approach allows broadcast fusion in primal code despite the presence of data-dependent control flow. We discuss an experiment in which a Julia implementation of our technique outperformed pure reverse-mode TensorFlow and Julia implementations for differentiating through broadcast operations within an HM-LSTM cell update calculation.

Downloads

  • NIPS18 SystemsForML submission.pdf
    • full text
    • |
    • open access
    • |
    • PDF
    • |
    • 339.47 KB
  • NIPS18 SystemsForML preprint.pdf
    • full text
    • |
    • open access
    • |
    • PDF
    • |
    • 435.44 KB

Citation

Please use this url to cite or link to this publication:

Chicago
Revels, Jarrett, Tim Besard, Valentin Churavy, Bjorn De Sutter, and Alan Edelman. 2018. “Dynamic Automatic Differentiation of GPU Broadcast Kernels.” In 2018 Conference on Neural Information Processing Systems : Proceedings.
APA
Revels, J., Besard, T., Churavy, V., De Sutter, B., & Edelman, A. (2018). Dynamic automatic differentiation of GPU broadcast kernels. 2018 Conference on Neural Information Processing Systems : proceedings. Presented at the 2018 Conference on Neural Information Processing Systems.
Vancouver
1.
Revels J, Besard T, Churavy V, De Sutter B, Edelman A. Dynamic automatic differentiation of GPU broadcast kernels. 2018 Conference on Neural Information Processing Systems : proceedings. 2018.
MLA
Revels, Jarrett et al. “Dynamic Automatic Differentiation of GPU Broadcast Kernels.” 2018 Conference on Neural Information Processing Systems : Proceedings. 2018. Print.
@inproceedings{8607909,
  abstract     = {We show how forward-mode automatic differentiation (AD) can be employed within larger reverse-mode computations to dynamically differentiate broadcast operations in a GPU-friendly manner. Our technique fully exploits the broadcast Jacobian's inherent sparsity structure, and unlike a pure reverse-mode approach, this {\textacutedbl}mixed-mode{\textacutedbl} approach does not require a backwards pass over the broadcasted operation's subgraph, obviating the need for several reverse-mode-specific programmability restrictions on user-authored broadcast operations. Most notably, this approach allows broadcast fusion in primal code despite the presence of data-dependent control flow. We discuss an experiment in which a Julia implementation of our technique outperformed pure reverse-mode TensorFlow and Julia implementations for differentiating through broadcast operations within an HM-LSTM cell update calculation.},
  author       = {Revels, Jarrett and Besard, Tim and Churavy, Valentin and De Sutter, Bjorn and Edelman, Alan},
  booktitle    = {2018 Conference on Neural Information Processing Systems : proceedings},
  language     = {eng},
  location     = {Montreal, Canada},
  title        = {Dynamic automatic differentiation of GPU broadcast kernels},
  url          = {https://github.com/jrevels/MixedModeBroadcastAD.jl},
  year         = {2018},
}