Due to the fierce competition in the marketplace for perishable products, retailers have to use pricing strategies to attract customers. Traditional pricing strategies adjust products’ prices according to retailers’ current situations (e.g. Cost-plus pricing strategy, Value-based pricing strategy and Inventory-sensitive pricing strategy). However, many retailers lack the perception for customer preferences and an understanding of the competitive environment. This model explores a price Q-learning mechanism for perishable products that considers uncertain demand and customer preferences in a competitive multi-agent retailer market (a model-free environment). In the proposed simulation model, agents imitate the behavior of consumers and retailers. Four potential influencing factors (competition, customer preferences, uncertain demand, perishable characteristics) are constructed in the pricing decisions. All retailer agents adjust their products’ prices over a finite sales horizon to maximize expected revenues. A retailer agent adjusts its price according to the Q-learning mechanism, while others adapt traditional pricing strategies. Shortage is allowed while backlog is not.