Control characters in text file




















The best answers are voted up and rise to the top. Stack Overflow for Teams — Collaborate and share knowledge with a private group. Create a free Team What is Teams? Learn more. How do I read a text file's hidden characters? Ask Question. Asked 11 years, 10 months ago. Active 5 years, 8 months ago.

Viewed k times. I've created a text file from an application that I developed. How can I have the same in Windows? Improve this question. Peter Mortensen See unicode. On a Mac, also see the file command, to tell if a BOM is present.

By the way, a quick way to enforce a wrong encoding on any computer: open the file in a browser and explicitly set the encoding to ISO just like you did in TextMate.

Next, do a View Source for the file. Arjan not just a Mac. And anyhow, the xxd -p command shows the actual bytes so is much better. Add a comment. Active Oldest Votes. Improve this answer. One of more, less, or the terminal emulator driving the window is converting some non-ASCII character that Microsort Word put in there, to make it readable on screen.

If the OP would cat the file and post the text, and then wc the file and post that output, you would be able to see that the text on the screen is longer than the text in the file. So it is very unlikely you can figure out how to stuff that particular character into a command like sed. Was it created using NotePad on Windows? If not what is the text editor used to create the file?

Asking because I can use this info to reproduce the issue here and test some possible solutions. Best regards. Just for your information catdoc — reads MS-Word file and puts its content as plain text on standard output. Clearly, it uses a set of conversion definition files, and the default ones do not deal with your case.

So you need to define your own configuration. Which means you need to do a diagnostic output of the file so you know which unicode characters you have to deal with. Well, I would view an awk script as a special case of a substitution definintion, with the unicodes embedded in the awk program.

You are right back where we were 18 days ago and 15 replies ago: please run od as I asked and show us the actual character codes you are dealing with. There are no magic fixes, because Word formats are proprietary and binary and they can make them as cryptic as they feel like. Here are several ways to do it; pick the one you are most comfortable with. Try it in VI first to ensure the correct command. Once that works, the stream method will work. I believe that in VI, what you see are the actual interpreted characters as best a Unix can determine.

Sure the Ctrl-V is a literal escape for the Ctrl-Q. But the M and minus will go into the string without any escapes. They make no effort to deal with the vagaries of Word Unicode bytes. The output from vi is inconsistent. Even when it fonts up an unusual character consistently, the terminal emulation software the thing that rasterises stuff into the xterm has the final say on escape sequences. Also omitted to remind us that there are a number of different metacharacters in the input, and it would be nice to convert than all at once rather than scan the entire data set multiple times.

Also, it does not need to use patterns, so it an order of magnitude faster it just sets up a character remap with entries and does an indexed look-up.

Hope the subject will be closed though I know… Franck. Weird that Notepad also seems to convert two single quotes into a pair of left and right apostrophes.

Windows is even more cranky than I thought. Clearly the OP is mistaken in thinking the file was created in Word. Maybe several attempts with subtly different results. Yeah, here it is: The OP responded by asking a new question!

There are 9 more replies in this one:. But checking with Word and comparing with od is better. No Account? Sign up. By signing in, you agree to our Terms of Use and Privacy Policy. Already have an account? Sign in. By signing up, you agree to our Terms of Use and Privacy Policy. Want to know more about line termination characters, just head on to Wikipedia.

There are multiple ways to find out the control M character in a text file. While moving the Windows file to Unix, you need to remove this CR character. V should be first. FTP software also helps to do explicit character conversion.



0コメント

  • 1000 / 1000