Improving LLM Inference Efficiency through Chunked Prefills and Piggybacked Decodes

arXivLabs is a platform where collaborators can create and share new features for arXiv directly on our website. Both individuals and organizations that participate in arXivLabs are aligned with our values of openness, community, excellence, and user data privacy. We are dedicated to upholding these values and only collaborate with partners who uphold them as well. If you have an idea for a project that can benefit the arXiv community, find out more about arXivLabs.