It also may simply not have access to information around meta censorship. It sees the prompt, and answers the prompt. If it has been trained in a particular way or its outputs are put through a filter, it cannot tell you anything about those.
I agree. At the end of the day, unless you can get it to reference things that it could only have known if x, you can only extract from it an educated guess that it has been specifically engineered to act in a certain way. Plausible deniability is key here, that's what will allow businesses to gaslight customers and get away with it.
It also may simply not have access to information around meta censorship. It sees the prompt, and answers the prompt. If it has been trained in a particular way or its outputs are put through a filter, it cannot tell you anything about those.
I agree. At the end of the day, unless you can get it to reference things that it could only have known if x, you can only extract from it an educated guess that it has been specifically engineered to act in a certain way. Plausible deniability is key here, that's what will allow businesses to gaslight customers and get away with it.