Describe the process and components of Reinforcement Learning from Human Feedback (RLHF) in the context of training large language models (LLMs). Discuss how RLHF incorporates key elements such as reward model training and proximal policy optimization (PPO). Furthermore, explore the challenges faced in aligning LLMs with human preferences using RLHF, and evaluate the limitations of this approach. What are some alternative methods being explored for improving alignment in LLMs?