The request-based load-aware scorers we have today, such as queue-scorer, active-request-scorer and others, track the number of requests queued or actively being served. It might be valuable to investigate more granular tracking that takes the request length into consideration, especially with the move towards tokenization and for heterogenous workloads.
It could make sense to introduce a flag in existing scorers, or new ones all together.
Benchmarking and clear value demonstration is expected as a prerequisite to landing this work.