This article examines the potential of deep reinforcement learning (DRL) methods in algorithmic commodities trading. The study presents a novel time-discretization scheme that adjusts to market volatility, improving the statistical properties of financial time series. Two policy gradient algorithms, actor-based and actor-critic-based approaches, are introduced to optimize trading agents that consider transaction costs and risks. These agents use CNNs and LSTMs as function approximators to map historical price observations to market positions. Backtesting on front-month natural gas futures reveals that DRL models outperform the buy-and-hold strategy, increasing the Sharpe ratio by 83%. The risk profile of the agents can be customized through a hyperparameter that regulates risk sensitivity. Actor-based models outperform actor-critic-based models, and CNN-based models show a slight advantage over LSTM-based models.