rrTV-PHOTO   New HD TV
HOME   rrTV-PHOTO   GALLERIES   MY GALLERY   HELP-FAQ
myHOME PM pmRR MEMBERS 823 ONLINE 21 EVENTS SEARCH REGISTER  START HERE
 
9 pages [ <<    <     4      5     ( 6 )     7      8     NEXT    >> ]14490 viewsPOST REPLY
Midland Helicopters . HeliProz . ZoomsHobbies

.
.
Radio - Servo - Gyro - Gov - Batt > Starting a Home-Brewed PCM Receiver Project
 
 
w.pasman
Elite Veteran
Location: Netherlands

Hi MarkF!

Thanks for the explanations!

"I'm sensing some skepticism in your answers, and that's OK - I'll answer it with results, for the code will speak for itself. One of the reasons that this is taking as long as it is is that I am very heavily documenting the code, as I usually do, specifically to help allow other folks to understand how this works."

No, don't get me wrong I greatly appreciate and greatly adds to your confidence and credibility to build software. The problem with it is that it requires a lot of expertise to use it and interpret the resuls properly. Probably you are the only one on earth that can do this anyway. Anyone else will just look at your reports and not try to run the software. Unless it finally becomes a fully working system that they can use practically, for instance to fly their thing

"As I mentioned, this was a measurement of the raw speed of the code, taken by having the event_delay routine immediately return, instead of waiting for the proper event time. This is a good way of measuring total processing requirements in a real-time system (ignoring hot spots, which are dealt with individually)."

Yes but you didn't explain the event_delay routine (or did I miss that?) so that never made sense to me. If you talk about "total processing requirements" this suggests that you mean that you suppose all frame data to be available at once. But I can't really believe you mean that because the issue is to do as much processing during time that doesn't count - the time before the CRC has arrived and the frame is deemed OK. Furthermore, you wanted to do as much as possible waiting in parallel. So do I understand this right that 410us is from the moment the full frame has arrived till the moment a servo that would be in the '0'-position has been informed? I think I miss something still....



""Be careful not to overestimate the gains from assembly code. Gains up to 3 may be realistic, above that not."
I'm sorry, but that is incorrect...."

Mmm okay you have different experience from what I saw. Maybe it also depends on the platform and compiler. I'm used to gcc etc, the 'main stream' compilers, but it may be different from special purpose and/or cross compilers.

"No, I'm not using Futaba's scheme at all. I'm currently using my own 16-channel data format, ....."
Yes sorry; but the main composition is the same: half of the frame straight coded, the other half differential; then some extra bits and a 16 bits CRC.
The problem I have with that format is the single 16bit CRC at the far end. Seems a lot of latency...


" For emphasis, the times above are the latencies after a frame is received - the worst case in that context is a subframe time plus one RF sample time, or about 14.7 mS until the end of all PCM code 0 servo pulses, for a 16-channel receiver. Remember, too, that all servos are driven simultaneously in my setup, so there is no additional subframe delay as the receiver staggers the servo pulses, as in other systems."

AAAAAAAH !!!! [ so there is the clue. ] So why do you write "More importantly, that's from the start of the frame, not the end of the frame! " The start of the frame means for me the moment the receiver gets the start of the frame, and it could even mean the moment the transmitter starts working on the construction of the next frame. Now I'm curious what your interpretation is of the start of the frame.



" For emphasis, the times above are the latencies after a frame is received - the worst case in that context is a subframe time plus one RF sample time, or about 14.7 mS until the end of all PCM code 0 servo pulses, for a 16-channel receiver. Remember, too, that all servos are driven simultaneously in my setup, so there is no additional subframe delay as the receiver staggers the servo pulses, as in other systems."

That sounds a lot more realistic than the more airtronix-style latencies you mentioned at other times. Impressive result! It will bring performance in the same latency region as PPM currently on Futaba, but then with the reliability of PCM!

But let me sketch you a scenario where I think your latency counting goes wrong. Assume I throw the stick in the corner a microsecond AFTER the frame was sampled in the transmitter. Then I have to wait for the NEXT frame (+12ms) till my data is sampled and put in a frame for transfer. The frame is fully arrived at the receiver after another 12ms, and then we have your mentioned 1ms or so extra calculations. So that makes 25ms? Am I right here?


" As I'd mentioned in the Airtronics thread, it's definitely possible to change to Hamming codes for error correction, which would remove the frame-based latency, and then you'd have an individual channel latency. In this case, including the time to communicate the channel code and the ECC bits, the worst-case minimum latency from the last sample of a channel would be ~2.2 mS with FSK, or about 1.5 mS with FQPSK. However, the maximum latency would be worse than what I have now (there would be more ECC bits than CRC bits, so the subframe time would be longer)."

I don't think so. [I agree with the longer subframe time but not with the maximum latency] Consider the above scenario. Again I throw the stick in the corner a microsecond after the channel was sampled. As before I have to wait till next time, which is now about 8*2.2=17.6ms later. The frame is transmitted and decoded, another 2.2ms later, and a few micros my servo is updated (in this case nothing has to be done in parallel as the servo data comes in one by one). That makes a total of 19.8ms, lower than your 25ms.

It stays tricky this latency business.....
Cheers!
11-01-2003 Over year old.
HOMEPAGE  
 
 
MarkF
Senior Heliman
Location: Palo Alto, CA

How to Write 10X Faster Code

Hi, Gang!

Warning: We're about to get into coding tips and techniques here, which is really far afield from helicopters. Skip to the next message if you're not interested in this!

These days, there aren't too many of us left that are real efficiency fiends. Of those that are, most started in the early days of computing when almost any "real time" project was a major challenge (I started back with the 8008 and the Signetics 2650)! While this is off the topic of the receiver, per se, I'll describe some of the techniques that I've learned over the years, which start well before contemplating assembler language.

Before we get started, my receiver project is structured around what I call "Events", that is either an Input (reading the pin connected to the RF portion of the receiver), or an Output Set (driving servo pins High), or an Output Clear (driving servo pins low), so I'll be using that terminology below.


Know When to Break The Rules!

One of the first, and most important techniques for improving efficiency is to know when, and how, to break the rules of modularity. Computer Science profs will drill into you over and over again how important it is to maintain strict modularity guidelines like data hiding, abstraction and isolation, etc, and they are wise to do so! However, as Ralph Waldo Emerson once wrote, "A foolish consistency is the hobgoblin of little minds". When you're getting started, there is no question whatsoever that strict modularity is the way to go. However, as you gain more experience, you'll begin to identify times when strictly enforcing modularity can seriously hinder performance.

One of the best-known examples of this is "TCP/IP Fast Forwarding", which is central to the Internet. To begin with, TCP/IP itself does not adhere rigidly to the standard OSI network model, it breaks modularity for performance. What Fast Forwarding does, though, is to let the lowest-level network code reach up two layers into the network stack directly, to handle the 95% case of simple network forwarding. By doing this, CPU overhead can be dropped by an order of magnitude.

Overdoing this can easily wind you up with a spaghetti mess of unmaintainable code, so the challenge is how to do it well and appropriately. Unfortunately, this mostly takes experience, for it's difficult to specify a hard and fast set of rules which tell you when to do this. However, when performance demands it, modularity-breaking really can be appropriate. Along those lines...


Design Your Data Structures for the Inner Loop

When I tackle a performance-critical program, I'll start with how I store data first, keeping in mind how the inner loop will access the data. While multiple nicely-coded structures can be pretty, it can also result in far less efficient data accesses. The receiver uses a simple linear array of packed structures that allows each event to be read by the event code in just one instruction, instead of the 10X more it might take with structures that aren't designed for access efficiency.


Global Variables Aren't Necessarily Evil!

OK, so maybe you'll prefer me to say that global variables are an evil necessity! There are many times when global variables are important for performance enhancement, and this is definitely the case here. As one example, I could be passing around the "time_base" variable that determines the timing of every action in the receiver, but why? Instead, this is a global variable, as are many of the others.

Note that this, too, shouldn't be overdone, since your goal is still to try to be as modular as possible. However, the key is to recognize when modularity is getting in the way, rather than helping out!


Use Fast Algorithms

After you've designed the data structures, use smart algorithms. The most tightly coded assembler-language bubble sort routine will probably be blown away by a C quick-sort. Along those lines, the add_event() routine uses a binary search and an insertion sort to efficiently add new events to the event lists.

Think first... about algorithms. THEN start coding!


Make the Tighest Inner Loop You Can

Still in the realm of algorithm design, the next step is to move whatever you can outside of the inner loop code. Simple examples include things like counters, where you count the number of bytes received as you receive them. Instead of doing this, move the count determination outside of the loop. The goal here is to pare away every operation other than that which is absolutely essential to the core problem.


Remove Special Cases

The next key thing to do is to try to make everything in the inner loop as consistent as possible. Wherever you can, check for special cases outside of the inner loop, rather than inside the inner loop. Comparisons and branches are extremely slow!

The consequence of this is that you'll want to make everything handled by the inner loop as consistent as possible. In the case of the receiver, the code always loads both a delay length, and an output value - even for input events! The reason for this is that it is far faster to always read both than it is to check to see if this is an input event and just read the delay value.


Eliminate Library Calls

Here's one of the first tricks that are relatively unique to assembly language. Wherever possible, don't call library routines in an inner loop, instead perform them inline. A perfect example in the receiver is the rf_input() function, which needs to keep a 64-bit long buffer of the most recently received samples. The C compiler forces a library call to accomplish the shifting of the "long long" rf_sample value, even if you specify inlining (and that's not at all unusual). By replacing this entire call with two ARM7 "MOV xx,yy,RRX" instructions, we save more than an order of magnitude here.


Use Data Types that C Doesn't Support

Continuing with the rf_input() function, try to use unique features of the processor that C doesn't support. A perfect example is the processor's carry flag, which essentially allows you to have a 65-bit data type. Accomplishing an efficient input in assembler requires just one instruction to load the Input register, one to shift the input into the carry flag, and two to do the 64-bit shift.

Since C doesn't support this data type, you have to replace it with the before-mentioned library call, the Input read, an immediate read of a mask value, an AND operation, a Shift operation, and an Or operation. Much slower!


Use The Instruction Set!

The next thing to learn is to carefully study the instruction set of the processor that you are using, and to seek ways of exploiting its unique abilities. A great example here is the first instruction that starts each event in the receiver: "LDMIA R4!,{R2-R3}". This single opcode loads both the duration of an event, as well as the outputs to be written, and it advances the data pointer to the next event in the event list.

Another, and even more significant example, is that the ARM instruction set supports conditional execution. That is, it lets the processor execute an instruction if and only if the processor flags are set in a particular way as the result of a previous operation. Before I wrote the event compiler, I used this capability to allow the event handler to process all three different types of events with no branches, by loading the event_type into the processor flags and then conditionally executing the I/O instructions (this is also another example of removing special cases).

Similarly, the ARM lets you decide on an instruction-by-instruction basis whether you want to have the operation affect the processor flags, it has incredibly powerful shift capabilities, etc. Spending the time to learn these capabilities in-depth can make a big difference!


Eliminate Temporary Variables

One huge benefit of assembly language is that it lets you eliminate temporary variables that compilers will require. For example, very few C compilers will generate an instruction like "LDR R0,[R0]", in which case a register that's pointing to something is loaded with the value of what it's pointed to. Using methods like this reduces the number of registers required, which can prevent running out of registers. This can lead to not needing to store values in memory, which is a very good thing!


Eliminate Loads and Stores

Compilers rarely are able to structure complex routines such that no memory references are required. However, if you very carefully analyze the flow of data in your inner loop, you can usually manage to keep all intermediate values in registers, rather than in memory. While this isn't as important on the chips we're using here, this can be enormously important on fast X86 machines, where a single memory reference can result in the loss of hundreds of instruction execution opportunities. With processors like those, just eliminating a single "uncached" memory reference can speed up code nearly two orders of magnitude!


Write a Compiler

When all of these techniques don't meet your expectations, you can always do what I did, which is to write a compiler that generates its own machine code on-the-fly! This is perhaps the ultimate performance-enhancing trick, for it results in the fastest possible execution speed. It's also not as hard as you might think - the code I'll be publishing will show you one way it can be done.


Summary

After all is said and done, it really is possible to deliver 10X speed-ups in many situations by using assembly langauge. However, choose your opportunities carefully, for writing this kind of code requires far more time than normal C hacking. By carefully designing the entire program so that the inner loop is minimized, it's then worth the effort to optimize the hell out of it!

Have Fun!
MarkF

Addendum: While it isn't relevant to what we're doing here, I'll mention another approach that's extremely important when working on the "big" CPUs like the P4 and the Opteron. These processors execute multiple instructions simultaneously, and each instruction has a variable amount of execution time. When an instruction needs the output of a prior instruction that hasn't yet completed, it will force the CPU to "stall", wasting cycles until the prior instruction completes. While the compiler will try to "schedule" instructions in an order that minimizes execution time, no compiler can come close to what humans are capable of. In several cases, we've created routines that execute the first half of an inner loop, then we've interleaved, or "folded together" the second half of the inner loop with another copy of the first half of the inner loop, the purpose of which is to ensure that the processor is never stalled. This can have an enormous impact on performance. In one nice case in the AMD Opteron Performance Optimization manual, AMD shows that it takes 16 clock cycles to perform a single complex multiplication on the Opteron processor. However, if you very carefully write the code, it's possible to perform four complex multiplications in 17 clock cycles!!! Now, writing this kind of code is definitely in the "advanced" category, but it's a powerful technique that can have a major impact on performance-critical code!
11-01-2003 Over year old.
HOMEPAGE  
 
 
MarkF
Senior Heliman
Location: Palo Alto, CA

Hi, w.pasman!

Sorry it took me so long to write the "10X" note, I didn't see your response in-between! I'll address your questions below:

Yes but you didn't explain the event_delay routine (or did I miss that?) so that never made sense to me.

Sorry if I wasn't clear! When you are developing a real-time program, one interesting metric is how much total processing time is required to do everything if you never had to wait for an external event. That's what I did: I just measured the code's speed as it executed all of the processing for a subframe without waiting for RF inputs, or for proper servo timing. This lets you get a feel for how much excess processing time is available. I find it useful...

My guess is that you're confusing this early measurement with the current results in actual operation, and they're very much "apples and oranges" (i.e. not comparable). One figure talks about how much processing the entire body of code represents, while the other talks about how an individual section of the code responds to a specific event.

Mmm okay you have different experience from what I saw. Maybe it also depends on the platform and compiler. I'm used to gcc etc, the 'main stream' compilers, but it may be different from special purpose and/or cross compilers.

Actually, I've done this with a wide variety of different compilers, on many different CPU chips. Right now, we're doing the same thing at work on AMD Opterons for our "big" receiver, as compared to GCC. Hopefully, my previous post will suggest a few of the techniques I use to do so. As it suggests, just replacing one C routine with assembler isn't often the most efficient way to go. Instead, structuring your entire program around the needs of the inner loop can provide the really significant performance speedup opportunities.

[The Futaba-like] main composition is the same: half of the frame straight coded, the other half differential; then some extra bits and a 16 bits CRC.
The problem I have with that format is the single 16bit CRC at the far end. Seems a lot of latency...


Actually, it's a 20-bit CRC, comprised of two 10-bit words, but I understand your point. The problem with distributing the error correction is that you can reduce the minimum control latency, but the worst-case control latency will be longer! As a specific example, assume a 4-bit preamble. If I were to replace my 20-bit CRC with four ECC bits every word, the frame length would increase from (4 + 9 * 10) = 94 bits to (4 + 7 * 14) = 102 bits. As we were discussing in the Airtronics thread, this also will lead to skewed control outputs versus a purely frame-based approach.

The start of the frame means for me the moment the receiver gets the start of the frame, and it could even mean the moment the transmitter starts working on the construction of the next frame. Now I'm curious what your interpretation is of the start of the frame.

Since I'm building a receiver, I'm reporting receiver latencies. Specifically, the "start of the frame" that I'm referring to here refers to the instant the RF input pin rises to a high level on the first bit of the preamble, which will be a few tens of microseconds after it leaves the transmitter antenna. Once I complete the receiver and start working on a transmitter, then I'll report on actual system-level latencies.

That sounds a lot more realistic than the more airtronix-style latencies you mentioned at other times. Impressive result! It will bring performance in the same latency region as PPM currently on Futaba, but then with the reliability of PCM!

I apologize if I haven't been clear enough in repeating that my latencies were from the end of the frame to the end of the first servo pulses. Once I can report on system-level latencies, this should be a lot easier for folks to follow! And... thank you!

But let me sketch you a scenario where I think your latency counting goes wrong. Assume I throw the stick in the corner a microsecond AFTER the frame was sampled in the transmitter. Then I have to wait for the NEXT frame (+12ms) till my data is sampled and put in a frame for transfer. The frame is fully arrived at the receiver after another 12ms, and then we have your mentioned 1ms or so extra calculations. So that makes 25ms? Am I right here?

Hmmm. Since I haven't been reporting on any system-level latencies, I can't say that I quite understand how I'm wrong about it. Actually, if the transmitter's mixing is fast enough, and I will aim for this once I start on mine, it'll resample all the inputs and perform the mixing each time an analog channel is about to be sent. This is completely independent of whether or not frame-level CRC or word-level ECC is used. While this will require 8X mixing capability (performing a full remix for every analog channel value that's sent), this would, in fact, lead to a worst-case system-level latency of about 16 mS with FSK, or about 11 mS with FQPSK!

If this were combined with ECC in place of CRC, the minimum control latency would become about 5 mS, but the maximum would be about 17ms FSK, 12mS FQPSK. Yes, this does have the same disadvantage that
the potential exists for control skews, but it delivers the lowest possible system-level latency that I can think of.

I'm still undecided as to whether I'll go with the ECC approach versus the CRC approach, since I lean towards minimizing the worst-case control latency for greater predictability (humans are good at automatically adjusting their responses for known latencies), as compared to reducing the minimum control latency but increasing the worst-case control latency. As I said in the Airtronics thread, we may ultimately have to build both and let the competition flyers decide which one they like best!

I don't think so. [I agree with the longer subframe time but not with the maximum latency] Consider the above scenario. Again I throw the stick in the corner a microsecond after the channel was sampled. As before I have to wait till next time, which is now about 8*2.2=17.6ms later. The frame is transmitted and decoded, another 2.2ms later, and a few micros my servo is updated (in this case nothing has to be done in parallel as the servo data comes in one by one). That makes a total of 19.8ms, lower than your 25ms.

I believe that I've just explained how it's possible to go even faster, above.

Best Regards,
MarkF
11-01-2003 Over year old.
HOMEPAGE  
 
 
w.pasman
Elite Veteran
Location: Netherlands

Hi Mark,

Thanks again for your elaborations!

" I'm still undecided as to whether I'll go with the ECC approach versus the CRC approach, since I lean towards minimizing the worst-case control latency for greater predictability (humans are good at automatically adjusting their responses for known latencies), as compared to reducing the minimum control latency but increasing the worst-case control latency. As I said in the Airtronics thread, we may ultimately have to build both and let the competition flyers decide which one they like best!"

There will always be jitter on the latency, at least with the size of 1 frame time. This is the difference between throwing the stick just before versus just after sampling has been done (of that stick of course).
The bad thing with the CRC at the end of the frame is that the codes in the start of the frame have to wait for the 'confirmation' that is nearly a full frame further away in time. This causes doubling of the worst case latency. (It doesn't double the jitter)

I wouldn't worry about the control skew and human compensation capabilities too much, considering the jitter in the latencies with Futaba. I'm not sure what they are doing but it looks pretty bad... So as we can fly with that, apparently a fluctiation below 50ms is small enough to ignore?

"Actually, if the transmitter's mixing is fast enough, and I will aim for this once I start on mine, it'll resample all the inputs and perform the mixing each time an analog channel is about to be sent."

Yes, you are right that it helps for the channels at the end of the frame, they are closer to the CRC and they will benefit later sampling. But the first channel is sampled a long time before the CRC and this first channel determines the (worst case) latency.

"This is completely independent of whether or not frame-level CRC or word-level ECC is used."

I dont follow this remark. I think you try to say that the latencies are the same for frame-level and word-level ECC. But it is the distance to the ECC/CRC that makes the difference in latency.

"this would, in fact, lead to a worst-case system-level latency of about 16 mS with FSK, or about 11 mS with FQPSK!"

I still don't agree. I would really recommend trying to understand my example. It's quite clear as I wrote it I think? Maybe I should put in some numbers.

With your frame layout:

T=0(ms): channel 1 is sampled for frame 0
T=1: stick of channel 1 is thrown into the corner
T=12: channel 1 is sampled for frame 1
T=13: channel 1 info arrives at receiver. Nothing happens yet
T=24: CRC for frame 1 has arrived
T=25: servo 1 has been set to corner position

With a per-channel CRC and 2.2ms per channel, this becomes

T=0: channel 1 is sampled for frame 0
T=1: stick of channel 1 is thrown into the corner
T=17.6: channel 1 is sampled for frame 1
T=19.8: frame 1 channel 1 arrives in receiver
T=21.8: (servo driving pulse lasts 2ms): servo has been set to corner.

What I try to say is that you really need a system perspective to decide the frame layout and to decide between frame- and wordlevel codes

An option that was not yet discussed is, how about setting the servos to the received position BEFORE the ECC/CRC/whatever is received? This will really give you the lowest latency! This will cause some jitter but you can always set the servo to an old position later. Both systems would benefit a bit from this.
Putting this to a more extreme point, I would even suggest to do much less error detection bits onto the frames. Say you want 2^-10 error probablility so you need 10 bits of code. Now you can put this on each and every frame, but you can also put just one bit in the frame and decide after 10 frames whether the last 10 frames were OK! If not you dig up the history and use the data from 11 frames ago. How about that! This would also solve the often heard complaint about PCM that it doesn't warn when noise is on the reception as you would have up to 10 frames of noise before it uses the old value.
11-02-2003 Over year old.
HOMEPAGE  
 
 
w.pasman
Elite Veteran
Location: Netherlands

Coert,

Good idea to search the patent database. I also looked there, and there's a lot more information there on futaba radio stuff.
One interesting thing is the description of the transmitter module (05850597)
If I look at the scheme they give for the PCM1024 in the patent you mention, I dont see the CRC and extra data bits (for the failsafe settings I presume). I didn't read the entire patent though. [So that may be the difference with PCM1024Z]
11-02-2003 Over year old.
HOMEPAGE  
 
 
w.pasman
Elite Veteran
Location: Netherlands

Angelos

I searched all the links mentioned here, but unfortunately nowhere the PCM1024Z modulation/frame format...
So you figured it all out? Can you put the details somewhere, I would like to know how that works in detail. The differential coding used (is it linear, polynomial, exponential?), the extra data bits, the failsafe data, etc.
11-02-2003 Over year old.
HOMEPAGE  
 
 
MarkF
Senior Heliman
Location: Palo Alto, CA

Hi, W.Pasman!

Sorry for the oversight! I just dropped a bit when I wrote about the 8X mixing approach, and completely forgot about waiting for the CRC before taking action! Duhhh. My apologies!

So, perhaps I'll add Hamming codes, after all! As per your suggestion about one bit (i.e. nothing other than parity), that will only work in a very high SNR environment, and this definitely isn't! The question I'll have to answer is whether to use a baseband coding scheme, or not. The current manufacturers are using this to prevent DC (i.e. long strings of zeros or ones), but there are other ways to accomplish this. Specifically, randomizers! What a randomizer does is simply to XOR [eXclusive OR] the data with the output of an LFSR [Linear Feedback Shift Register - essentially a random number generator], so that the data appears random. The same thing is done at the receiver (it uses the same LFSR function), which recovers the original data. The nice part about this is that it requires no additional bits for overhead, and is trivially fast (in fact, it'd only take 217 nS per word).

The unknown question at this stage is whether baseband coding would be more resilient to errors than the ECC codes, and the only way to know that would be to build it or simulate it, and simulating it is one heck of a lot easier! Fortunately, I use MATLAB at work (a very powerful simulation tool), so one of these days, I'll set up a simulation of the RF environment, and see what kind of results we can get with different coding schemes over different SNRs.

As far as the current state of the receiver goes, I've been chasing a very odd jitter in the servo outputs of +/- ~50 nS. While this could be a code bug, the window of possibilities is shrinking pretty quickly, and it may well be an unusual "feature" of the Philips chip that isn't documented (one potential H/W source could be the synchronizer that probably exists between the peripherals - specifically the H/W Timer register - and the CPU core; even though it is counting at 60 MHz, it might not be possible to read the updated values every CPU cycle). Anyway, this is the most recent interesting distraction. I probably ought to stop wasting time on it, since this is only +/- 1/20th of a PCM step, but it is annoying after going to all the effort to eliminate jitter. Sigh... we'll see.

Unfortunately, I didn't make any progress on the receiver today, since I spent the day at a cellphone developers' conference., and brought home the new Borland Mobile Studio C++/Java development environment and a Nokia 3650 Symbian O.S. phone with - guess what - a ~60 MHz ARM7TDMI CPU - the exact same thing I've been developing the receiver on! While this is about as off-topic as it gets, the reason I went there was to see if a new application that I want to create would work or not. It will work, so we're actually going to be starting up another company as a spinout from my current company - Yippee!

Have Fun!
MarkF
11-03-2003 Over year old.
HOMEPAGE  
 
 
MarkF
Senior Heliman
Location: Palo Alto, CA

Hi, Folks!

The more I think about it, the more I'm worried about the single-word ECC approach. The problem is the very nature of RF: errors normally occur in bursts, rather than as isolated single bits. This problem is made even worse when using QPSK, where differential encoding can lead to paired symbol errors, even though only one symbol may have been corrupted. To have a robust solution, it will almost certainly be necessary to support multiple bit error correction, leading to even higher ECC overhead.

This fundamental problem is why the vast majority of high-performance RF systems (including all digital broadcast TV, digital cable TV, satellite TV, most digital cellphones,and even CD and DVD players!) use a form of error correction known as Reed-Solomon codes. This form of coding is capable of correcting long strings of errors anywhere within a frame: the number of errors that can be corrected is determined by how many words of Reed-Solomon codes are appended to the end of the frame. Despite the potential latency advantages of single-word ECC, it just isn't used for RF, thanks to the bursty nature of radio frequency errors.

So, this'll take some real thought. Do we build a scheme that says "Screw it! Errors will occur, and we'll pray that we get a good frame once in a while", as current systems do, or do we rely on the advances in data communications and say "Accept the latency. Build a system that corrects the vast majority of errors that do occur"? The disadvantage of the second approach is that it would not just force the latency to be two sub-frames, it would also reduce the amount of overlap that's possible on the servo output pulse (the processing time required to perform the error correction when errors occur is pretty significant, which would reduce the amount of overlap, and possibly even eliminate the opportunity to perform the overlap). On the other hand, it sure would be nice to dramatically reduce, and nearly eliminate, the sorts of radio glitches that happen with today's systems from time-to-time.

Decisions, decisions - this one's tough!

Cheers!
MarkF
11-03-2003 Over year old.
HOMEPAGE  
 
 
w.pasman
Elite Veteran
Location: Netherlands

Hi Mark!

" As per your suggestion about one bit (i.e. nothing other than parity), that will only work in a very high SNR environment, and this definitely isn't! "

I dont understand this. I think the transmission is nearly flawless, how often do we really have dropped frames? So that would warrant parity bits I guess?

"The question I'll have to answer is whether to use a baseband coding scheme, or not. The current manufacturers are using this to prevent DC (i.e. long strings of zeros or ones), but there are other ways to accomplish this."

Can't advice you on odd jitter and baseband coding....

" It will work, so we're actually going to be starting up another company as a spinout from my current company - Yippee! "

Congrats! So you're really working while working on this transmitter

"The problem is the very nature of RF: errors normally occur in bursts, rather than as isolated single bits. This problem is made even worse when using QPSK, where differential encoding can lead to paired symbol errors, even though only one symbol may have been corrupted."

I don't understand why this is a problem, if bits are corrupted the whole frame (several words) is dropped, and probably even multiple frames if the burst is long enough. It has always been that way and we have been flying without problems. As I said the transmission works apparently very good.
Also this burstiness argues against using ECC because in an error burst most likely too many bits are mangled to make ECC useful.

If you really want to go to the edge of bandwidth, you may have to trade off ECC overhead against bandwidth. But I think that's going too far, I mean better stay far from this point and make a reliable connection needing not too much ECC. Again I suggest to save on error check bits and bring the latency down, and use even tricks as smearing out error check bits over frames.

"(including all digital broadcast TV, digital cable TV, satellite TV, most digital cellphones,and even CD and DVD players!) use a form of error correction known as Reed-Solomon codes. "

For TV, cellphone and DVD this is a MUST because the signals are highly compressed, and a single bit failure would lead to catastrophic effects at the receiver end. Even for data CDs that holds, although for the original CD format it would be less of a problem. But for our purposes this is no issue unless you also want to do heavy coding.

BTW compression seems a logic choice for our application, there is A LOT of redundancy in the frames. The big reason not to do it is to have a quick recovery in case of failure.

So as you say "Screw it! Errors will occur, and we'll pray that we get a good frame once in a while" It's the only thing you can do anyway! I could add to it "pray that we get a good frame before we reach the ground"

"On the other hand, it sure would be nice to dramatically reduce, and nearly eliminate, the sorts of radio glitches that happen with today's systems from time-to-time."

A SHORT radio glitch is no problem, you just take it as warning that something is wrong. A little bit longer and it gets dangerous. You have to distinguish the short glitch and the lockout, the glitch you can smooth out but the lockout, nothing will help there.
So about the glitch. If you would use a parity bit for each channel and reset the servo immediately to its old value if the parity is wrong I guess that the servo doesn't even have time to get very far, so the effect of a radio glitch can be largely eliminated anyway. Alternatively you could fully eliminate the glitch by accepting a 1 ms extra latency, so that the servo stop pulse always comes after the parity bit comes through. You might even turn this question into a selling point, by leaving this little choice to the user: 16ms latency with 'glitch warning' during bad reception, or 17 ms latency with ironed-out glitches and only failsafe when it gets too bad!
11-03-2003 Over year old.
HOMEPAGE  
 
 
FredericG
Heliman
Location: Belgium

PCM1024Z

Hi Mark, Angelos, Pasman and all others,

First of all I am very impressed with your projects. The effort you put in documenting, describing your progression and sharing your knowledge, is also great!

It looks like many people are interested in decoding the PCM1024Z and are perhaps actively researching it. Should we perhaps start a dedicated thread for this?

I was also wondering: the TX project of Agelos, is it meant to be a commercial product? Is this why the is so reluctant to share details about the format?

Thanks,
Frederic
11-03-2003 Over year old.
HOMEPAGE  
 
 
w.pasman
Elite Veteran
Location: Netherlands

Hi Frederic

Yes that would be a nice idea. I think that we would be able to decode the 1024Z signal if we just collect and merge all knowledge that seems to be available here already. Angelos claims to have done this, and so does Coert. And then we have the undocumented code from that autonomous helicopter project.
11-04-2003 Over year old.
HOMEPAGE  
 
 
Angelos
Key Veteran
Location: nr Oxford, OX11, UK

Any protocol can be patented even if it has minor differences from another. In the case of PCM1024Z I believe that if there is a patent is should by now have expired. PCM1024Z is on the market for over 14 years which is how long patents usually last (please correct me if I am wrong on this one). I will have to look into that better as I have an interest producing the first aftermarket PCM1024Z receivers. Any help with the RF front-end will be appreciated as this is not one of my strong areas. If anyone comes up with a good design using parts currently in production I will pay for your work. Motorola had a range of single chip dual conversion receivers intended for cordless phones but they seem to be discontinued now that DECT phones became popular.

Regarding my TX design, it is too early to tell if it will ever evolve to a commercial product but whatever the case it will be an open source platform. At the moment I plan to develop a replacement set of PCBs for the 9Z. I recently bought a faulty 9Z which I plan to use as housing for my design. I was using a Futaba Challenger until now. When the hardware is finalised and basic firmware (PPM, PCM, servo travel limits and servo reverse) is ready I will perhaps produce a few more assembled PCBs for anyone who wants to modify his 9Z and help with the software development. I am trying to establish some rules for making the software modular.
11-04-2003 Over year old.
 
 
FredericG
Heliman
Location: Belgium

Hi Angelos,

Quote 
but whatever the case it will be an open source platform

Will this also include the PCM codeing/decoding code?

Please, don't get me wrong, I am not pushing you in any way.... It is just interesting to know.

Thanks a lot,
Frederic
11-04-2003 Over year old.
HOMEPAGE  
 
 
w.pasman
Elite Veteran
Location: Netherlands

I made a new thread on decoding PCM1024Z, so let's not derail Mark's thread any further on this. Mark we create quite some offspring from the original thread!
11-04-2003 Over year old.
HOMEPAGE  
 
 
Angelos
Key Veteran
Location: nr Oxford, OX11, UK

FredericG,
Since I am considering the possibility of manufacturing PCM receiver I don’t want to rush and publish the protocol, something which I could regret later. Once the information is on the internet it is just matter of time until it reaches the competition.

Regarding the transmitter I don’t know how it will be implemented yet. If I use a second chip to encode the protocol, the process will be transparent to the main software which will not need to know how the protocol works. If the main CPU does the protocol encoding too, then it could be a BIOS function or a precompiled library. I may even publish the source code for both encoder and decoder if I decide not to go ahead with the receivers.

-Angelos
11-04-2003 Over year old.
 
 
Angelos
Key Veteran
Location: nr Oxford, OX11, UK

Using a separate protocol CPU… pros and cons!

The official Futaba spec for servo pulses: 920usec (min), 1520usec (neutral), 2120usec (max). Since Futaba is the market leader I would recommend that everyone follows this spec if they are developing any type of servo signal generator (PCM decoder, gyro, governor, etc).

100% ATV produces 70% of the total servo movement and a pulse in the range 1098usec to 1941usec.

According to one of their service engineers, the service manual recommends that if the servo pulse is less than 919usec or 2121usec the decoder CPU crystal frequency must be measured and verified.

From the above I calculated that the pulse change is 2120-920=1200usec (1.2msec)

This 1.2msec is divided in 1024 positions thus 1.1718usec per step.

Here is another way to verify this…

Futaba uses a 3.4133MHz crystal for the PCM decoder CPU. This frequency I believe is internally divided by 4 and generates the clock for the hardware timers/counters that generate the servo pulses.

3.4133MHz / 4 = 853325 Hz, the period of this is 1 / 853325 = 1.1718usec

also this 3.4133MHz signal can be divided by 512 to give a 6.6666KHz clock. The period of this is 150uses which is the bit rate of PCM1024Z

For my PCM decoder I use a custom made crystal at 13.6532MHz which is 4 times higher than what Futaba uses. This higher speed is not required for computations, but because of the configuration of the timers inside the microcontroller that I use this was the lowest frequency that is suitable.

If I go ahead having a second protocol CPU, I can similarly clock that at 13.6532MHz and generate a perfect PCM bit stream or PPM frame with perfect 1.1718usec resolution.

If I go ahead with the single CPU approach, I can still generate a good PCM bit stream using timers and interrupts. However I am not sure that I could generate a PPM frame with 1.1718usec pulse resolution using a timer that generates interrupts. In any case the PPM signal has considerable jitter when recovered at the receiver. Thus it wouldn’t really matter is the software induces a bit of jitter too.

What are your thoughts? Perhaps it worth the hassle to make an attempt for a single CPU solution and see the results.
11-04-2003 Over year old.
 
 
MarkF
Senior Heliman
Location: Palo Alto, CA

Hi, Gang!

Sorry I've been incommunicado - I've been coding and documenting like a mad fiend! My apologies, too, to those that have sent me PMs and emails - I've been so focused that I haven't been answering anything - sorry, gang! The receiver is nearly complete (with the exception of that darned +/-50 nS jitter), except that I've run into yet another interesting challenge. I've been completing the failsafe functionality, and have realized why most folks use long preambles!

The issue is kind of complicated. Earlier in this thread, I'd briefly discussed preambles - that is, the known bit pattern which starts each received frame. For speed, I'd adopted a short four-bit preamble, short enough that at powerup, the receiver would probably falsely lock on to data that wasn't an actual preamble a few times before it actually began synchronizing with the transmitter. This isn't really a problem, since the CRC check would prevent those false lockons from actually doing anything, and after synchronization, the receiver would maintain lock just fine. However...

If the signal deteriorates sufficiently to the point where the failsafe is activated, then we have a problem. Since traditional analog servos only generate torque when they continue to receive servo pulses, we need to keep sending servo pulses even in the absence of incoming frame data. I've added that functionality, no problem, but have realized that to the extent that whenever random noise is interpreted as a preamble, it will delay the servo pulses. Note that the pulses will still have "perfect" pulse widths corresponding to the failsafe settings (hold or set), but the period of the pulses will be greater than one frame time whenever this false lock occurs.

We can make this a non-issue by extending the preamble to be a greater number of bits, but that once again slows down the frame rate. I'm thinking about living with this, since servos don't care too much about the drive period, but it is an interesting situation!

Angelos: I'm sorry that it's taken me as long as it has to publish my code, but it will easily accomplish the timings that you are trying to achieve. Not only can it match those timings, it will do so while sampling the RF five times faster than you've mentioned in order to deliver superior RF performance. As I mentioned, it delivers +/- ~50 nS accuracy on all servo timings, and +/- ~100 nS accuracy on RF sampling timings. This is done on a single Philips LPC2104 ARM7TDMI processor with a standard 10 MHz crystal (running internally at 60 MHz) - the only other components that are required are a zero-crossing detector (data slicer), servo output buffer/line driver, bypass caps, and the RF section.

For the specific data rates you mention, the 10 MHz crystal would generate an additional slight, but inconsequential error - I'll calculate it and post the actual timing error later on today.

Since my target is building a better system, I haven't worried about decoding the Futaba output format. If you wanted to do so, it would probably be fairly straightforward to change my receiver's frame decoding to match the Futaba format. Now, if you're going to sell products based on my code, I mentioned earlier that the "price" is sending me one of whatever you build that uses it, on the honor system. If you instead want me to adapt the code for you, or support you as you make the changes, then I'm willing to do so for hire. I'm sorry that I have to do that, but working on the Futaba system would distract me from my real target, which is delivering an R/C system that will far outperform existing products.

W.Pasman: I've decided that the only way to really know which approach to take is to try both out! I started creating an RF simulator, and realized that the issue isn't performance in normal AWGN (Additive White Gaussian Noise), the concern is the RF generated by the moving parts of the helicopter, static discharge due to atmospheric phenomena like nearby rain or lightning, etc. In other words, the only way to know how well a single-channel driven system will compare with a frame-based system will be to try both out in the real world.

Changing to a single-channel driven system will require a fairly major restructuring of the code, and will mean rewriting a lot of the documentation, which is where most of the time goes. Consequently, I'm going to finish the frame-based system first, then I'll generate a channel-based system later. So that folks won't have to keep waiting to see my code, I'll publish the frame-based code first, and will then release the channel-based code once that's done.

Frederic: Thank you very much, and welcome to the forum!

Cheers!
MarkF
11-05-2003 Over year old.
HOMEPAGE  
 
 
Angelos
Key Veteran
Location: nr Oxford, OX11, UK

Mark,
I am not really concerned how the receiver is implemented as long as it can provide servo pulses in the range 920usec to 2120usec and there are at least 1024 steps in there. Additional ways to control servos by placing them on a data bus is a bonus but considering the range of servos, gyros, speed controller and other devices that are currently on the market backward compatibility with the old fashioned servo pulse is a must.

Regarding the TX side, if I go for a single CPU approach, I want to make sure that the CPU will be able to do the calculations and maintain precise timing for the frame generation. I want to maintain backward compatibility with PPM and PCM and the way I plan to generate these frames is using a hardware timer that generates interrupt every time the RF signal must be changed. All frame signal transitions will be precalculated and stored in a table. Then the interrupt will only perform the next signal transition, update the timer, increment the table pointer and return to the main program. If your approach for generating the RF frame is similar (timer/interrupt) then your software will be easily intergraded with mine. Are you coding in C or ASM? To keep the system open source I thought to go for the GNU C compiler.

I am not sure how this will work out but if it becomes commercial and you wish to implement your code in it I will be happy to supply you with a unit. However, one thing I need to know beforehand is what hardware requirements you have for the TX side. From what I gather a DAC is all you need and it could be used to implement many other coding techniques. In fact I don’t see why I couldn’t generate PPM/PCM using this by transitioning only between two values. How often do you need to update the DAC and what voltage range do you need?

Cheers,
Angelos
11-05-2003 Over year old.
 
 
MarkF
Senior Heliman
Location: Palo Alto, CA

Hi, Angelos!

I’m sorry that I misunderstood your request! Compared to a 5X oversampling receiver, a PCM transmitter is really trivial. The receiver needs a 60 MHz, 32-bit CPU, while my PCM transmitter simulator will actually be written on a 5 MHz 8-bit PIC CPU! As you state, though, the timing for a PPM transmitter is definitely more demanding than PCM (though still far simpler than the receiver).

This is actually a pretty good candidate for the ARM’s FIQ interrupt. What you’d do for precision would be to set off a timer interrupt for each output that is about 50-60 cycles before the specific output time target, then you’d read the H/W time clock, and execute a variable delay to synchronize the CPU with the exact target time. Here’s a code fragment that performs this trick on the Philips CPU that’s loosely based on my event code:

LDR R2,HW_TIMER ; Load R2 with base address of the timer/counter
LDR R2,[R2] ; Read the H/W Timer/Counter to see what time it is now
LDR R3,time_target ; Get desired time for the output event [Start Event_Overhead]
SUB R3,R3,R2 ; Subtract current time from target time to get delay
SUBS R3,R3,#EVENT_OVERHEAD ; Subtract off the fixed amount of processing overhead
BMI EP3 ; ERROR - we missed the execution point for this routine!
;
; For the final time alignment, it's time to delay the precise number of CPU cycles
; that will make the output instruction occur at the exact cycle necessary.
; To do that, we first figure out how many 4 clock cycle loops are needed to get
; close, then use a variable number of NOPs to pinpoint the exact cycle count.
;
EP1: MOVS R2,R3,LSR #2 ; Compute number of loops to execute
AND R3,R3,#03H ; Place fractional delay count in R3
EP2: SUBNES R2,R2,#1 ; If loop count isn't already 0...
BNE EP2 ; ...then loop until the count becomes 0
RSB R3,R3,#2 ; Make the delay backwards (0->2, 1->1, 2->0, 3->-1)
ADD PC,PC,R3,LSL #2 ; Exec variable # of NOPs to achieve sync
NOP ; Fractional delay was 3 - execute 3 NOPs [NOPs don't count]
NOP ; Fractional delay was 2 - execute 2 NOPs [...in event OVERHEAD, ]
NOP ; Fractional delay was 1 - execute 1 NOP [...they just pad. ]
; [End Event_Overhead, which is number of clocks since Start Overhead]
STR R4,[R5] ; Store the output value in R4 to output port in R5

I apologize that the format looks all screwed up due to the forum's line wrapping!

Have Fun!
MarkF
11-05-2003 Over year old.
HOMEPAGE  
 
 
Angelos
Key Veteran
Location: nr Oxford, OX11, UK

Mark,
As I am working toward a prototype TX PCB it will be nice to know your hardware requirements on the TX side. I presume you only need a DAC, but what update frequency are you looking at and what voltage range? I presume updates will be relatively slow to keep the signal bandwidth down and that 8 bits resolution will be enough. Any idea about voltage range? Is 0 to 3.3V or 0 to 5V good?

Cheers,
Angelos
11-05-2003 Over year old.
 
 
9 pages [ <<    <     4      5     ( 6 )     7      8     NEXT    >> ]14490 viewsPOST REPLY
HeliHobby . Ron’s HeliProz South . Century Helicopter

.
.
Radio - Servo - Gyro - Gov - Batt > Starting a Home-Brewed PCM Receiver Project
 PRINT TOPIC Advertisers 

Subscribe to This Topic

Thursday, December 4 - 10:17 pm - Copyright © 2000 - 2008 runryder.com | email | link to rr | runryder needs cookie