I've already covered the one that usually applies to UFO sightings. As several people who wrote after me noted, including you, apparent size can be deceiving. You brought up another one, which is context. We perceive entire scenes as a whole. Apparent size and object identification tend to occur simultaneously.
I alluded to the relevant empirical science. One of the entertaining bits is where people make accurate models of familiar objects, but at greatly altered scale. Then these models are viewed from different actual distances and subjects are asked to estimate the distance to the object. To test contextual cues, people are shown photographs of coherent scenes, and others are shown the same scene with grid sections of the photo cut apart and rearranged. Each group is asked to locate a particular kind of object in the scene, or make other determinations that test the perception of rectification. The jumbled scenes mess people up.
The customary depth cues people come up with if you ask are surprisingly limited to relatively nearby distances, just 20 meters or so. We can form several hypotheses for how this evolved, but I'm not an evolutionary biologist so I won't cover them. Binocular vision gives us
retinal disparity, the difference in the image seen by the left and right eyes. Surprisingly this is not the most important effect of binocular vision. More important is
ocular convergence, the degree to which your eyeballs have to point inward to put the object in the corresponding area of each retina. This is a muscle-feedback cue. So is
focal feedback, how much your lens has to change shape to render an object in sharp focus.
A very underrated cue is simple
parallax. This does not require binocular vision. Your eyeballs don't lie along the axes on which your skull rotates as you move your head. Hence with each movement of your head -- even very small movements -- your eyeballs move to a different position in space, even when they re-orient to keep you looking at the same object. This also occurs in peripheral vision, so even if your eyes orient with your head, the scene changes in periphery. Parallax is generally more pronounced than most people intuitively believe, so even small changes in the position of your eyes amounting to handfuls of millimeters produce a noticeable parallax change. This is the primary near-field depth cue for people who have lost sight in one eye.
We have way too much confidence in our own perceptions and experiences.
And interpretations. People want to believe they have interpreted observations correctly. They don't want to accept that they were "fooled."
One of the most common UFO reports in my neck of the woods is seagulls, specifically the California seagull, our state bird. The California seagull, like many similar birds, is principally white with dark patches. The top surface of its wings is gray, but the underside of its wings (save for feather tips) and its breast are pure white. Seen at great distance against a mountain background, a flock of seagulls will seem to appear and disappear entirely as they change direction. If they wheel in unison, as flocks of birds do, and display their undersides, they will suddenly "appear" against a dark mountain. If you see their topsides against a mountain, they are quite effectively camouflaged.
"Whaddya mean seagulls?" say many reporters. "It's our state bird, fer cripes' sake. You think I can't recognize a seagull when I see one?" Well, yes, ninety-nine times out of a hundred a seagull (or flock of them) will look like a flock of seagulls. It's that one time it catches you by surprise because you weren't expecting that particular aggregate behavior.