Artificial intelligence (AI) provides promising insights to improve and support medical diagnostics. However, the design of AI systems can strongly affect the performance in clinical applications. This study aims to evaluate the impact of different design parameters, including image pre-processing, network architectures, loss functions, transfer learning, and data augmentation. As a case study, we focused on the prediction of age from retinal fundus images, which has been proven to be a crucial biomarker for general health screening. Considering all parameter combinations, we developed 144 models using UK Biobank images. Our preliminary analysis shows the critical influence of network architecture and training strategies, highlighting the need for careful selection tailored to the specific task and available data. Moreover, we demonstrate that certain image pre-processing approaches may not generalize well across different tasks and are not optimal for retinal age prediction. Our findings contribute to and offer valuable insights for standardized AI system design in medicine, contributing to improved reproducibility and reliability in the clinical domain.