Speech can easily be manipulated through techniques, such as text-to-speech synthesis, voice conversion, replay, tampering, adversarial attacks, and more. However, when the manipulation is applied only to a minor portion of an audio, the remaining real segments can have a dominant influence upon human listeners and make machine detection extremely challenging. Therefore, there is an urgent need to explore such a scenario, where synthetic speech is embedded within otherwise real audio.