Extend TrainedModelSizeStats to include the actual OS-reported (on Linux) memory usage for trained model deployments (pytorch_inference process), instead of relying solely on estimated memory values.
Background
Currently, TrainedModelSizeStats reports:
model_size_bytes - the size of the model definitionrequired_native_memory_bytes - an estimated memory requirement calculated from the model definition lengthThis estimated value is computed in TransportGetTrainedModelsStatsAction.java:
long estimatedMemoryUsageBytes = totalDefinitionLength > 0L
? StartTrainedModelDeploymentAction.estimateMemoryUsageBytes(
model.getModelId(),
totalDefinitionLength,
model.getPerDeploymentMemoryBytes(),
model.getPerAllocationMemoryBytes(),
numberOfAllocations
)
: 0L;
For anomaly detection jobs, PR #131981 (corresponding to ml-cpp#2846) added actual OS memory reporting via getrusage RSS values. This provides much more accurate information about real memory consumption.
Proposed Changes
Add new fields to TrainedModelSizeStats:
runtime_native_memory_bytes - current resident set size as reported by the OSmax_runtime_native_memory_bytes - peak resident set size as reported by the OSUpdate the Java side to consume these values from the pytorch_inference native process output (requires corresponding ml-cpp changes).
Update the stats retrieval logic in TransportGetTrainedModelsStatsAction and related classes to populate and return these values for running deployments.
Consider backward compatibility: The estimated required_native_memory_bytes should remain for deployments that haven't reported actual usage yet, or for models that aren't currently deployed.
Files likely to be modified
x-pack/plugin/core/src/main/java/org/elasticsearch/xpack/core/ml/inference/trainedmodel/TrainedModelSizeStats.java - Add new fieldsx-pack/plugin/ml/src/main/java/org/elasticsearch/xpack/ml/action/TransportGetTrainedModelsStatsAction.java - Populate actual memory valuesx-pack/plugin/ml/src/main/java/org/elasticsearch/xpack/ml/inference/pytorch/process/PyTorchResultProcessor.java - Process memory stats from native processx-pack/plugin/ml/src/main/java/org/elasticsearch/xpack/ml/inference/pytorch/results/PyTorchResult.java - Parse memory stats if included in resultsx-pack/plugin/ml/src/main/java/org/elasticsearch/xpack/ml/inference/deployment/DeploymentManager.java - Store/expose memory statsAPI Changes
The GET _ml/trained_models/<model_id>/_stats API response would include additional fields:
{
"model_size_stats": {
"model_size_bytes": 438123456,
"required_native_memory_bytes": 876246912,
"system_memory_bytes": 892452864,
"max_system_memory_bytes": 923845632
}
}
Benefits
Dependencies
This issue depends on the corresponding ml-cpp changes elastic/ml-cpp#2885 to report actual memory usage from the pytorch_inference process.