search.xml

<?xml version="1.0" encoding="utf-8"?>
<search>
  <entry>
    <title><![CDATA[one picture of ADS today]]></title>
    <url>%2F2020%2F11%2F22%2Fone-picture-of-ADS-today%2F</url>
    <content type="text"><![CDATA[]]></content>
  </entry>
  <entry>
    <title><![CDATA[mobility data center in ADAS/ADS]]></title>
    <url>%2F2020%2F11%2F22%2Fmobility-data-center-in-ADAS-ADS%2F</url>
    <content type="text"><![CDATA[keywords: autosar, data factory, AI training, simulation, v&amp;v, DevOps, IoT, adas applications 2~3 years ago, the top teams focus on perception, planning kinds of AI algorithms, which is benefited from the bursting of DNN, and lots invest goes there, and the optimism think once the best training model is founded, self-driving is ready to go. then there was a famous talk about “long tail problems in AV” from Waymo team in 2018, the people realize to solve this problem, they need as many data as possible and as cheap as possible, which gives a new bussiness about data factory, data pipeline. the investors realize the most cuting-edge AI model is just a small piece of done, there should be a data factory, which comes from MaaS serivces providers or traditional OEMs. as data collector doesn’t exist common in traditional vehicles, so OEMs have to first make a new vehicle networking arch to make ADAS/ADS data collecting possible. which by the end, the game is back to OEMs. at this point, IoT providers see their cake in AD market, OEMs may have a little understanding about in-vehicle gateway, t-box, but edge computing, cloud data pipeline are mostly owned by IoT providers, e.g. HuaWei and public data service providers, e.g. China Mobility. and the emerging of 5G infrastructure nationally also acc their share. pipeline is one thing, the other is in-vehicle SoC, which has a few matured choices, such as Renasas, NXP, Mobileye, Nvidia Drive PX2/Xavier/Orin, and a bunch of new teams, such as horizon robotics, HuaWei MDC e.t.c the traditionally definition of in-vehicle SoC has a minor underline about data pipeline and the dev tools around. but nowadays, taking a look at HuwWei MDC, the eco is so closed, from hardware to software, from in-vehicle to cloud. of course, the pioneer Nvidia has expand the arch from vehicle to cloud, from dev to validation already. SoC is the source of ADS/ADAS data, which give the role of SoC as mobility data center(MDC), we see the totally mindset transfer from software define vehicle to data defined vehicle. the mechanical part of the vehicle is kind of de-valued when thought vehicle just as another source of data on-line. to maximize the value of data, the data serivces(software) is better decoupled from vehicle hardwares(ecu, controller), which is another trend in OEMs, e.g. autosar. till now, we see the AI models, simulation, data services are just the tip of the iceberg. and this is the time we see self dirving as the integrated application for AI, 5G, cloud computing infra and future manufacturing. and the market is so large, no one can eat it all. referAutoSAR Classic from Huawei Mobileye赖以成名的EyeQ系列芯片同样内嵌了感知算法，但其在出售产品时候，往往都是软硬件打包出售，并不会根据客户情况进行针对性修改，或是让客户的算法运行在自己的感知芯片上。但地平线则采用完全开放的理念，即可提供硬件、也可提供包括算法的整体方案，还给客户提供了名为天工开物的完整工具链，让客户自己对芯片上的算法进行调整优化。 Matrix2 (地平线） mdc (huawei) Drive PX 2 (nvidia) DRIVE AGX Xavier （nvidia) Orin (Nvidia) 车载智能计算基础平台 参考架构 1.0 nvidia self driving form nvidia lead the most safe standard 凭借我们自身在安全和工程方面的经验，NVIDIA已致力于领导欧洲汽车供应商协会（CLEPA）互联自动驾驶车辆工作组。NVIDIA在仿真技术和功能安全方面，拥有丰富的发展历史。我们的自动驾驶汽车团队在汽车安全和工程方面拥有宝贵的经验。 通过NVIDIA DRIVE Con​​stellation这样的平台，制造商可以通过该平台对他们的技术进行长距离的驾驶测试，还可以设定在现实世界中很少遇到的罕见或危险测试场景 NVIDIA还与自动化与测量系统标准化协会（ASAM）合作。我们正在领导其中一个工作组，以定义创建仿真场景的开放标准，描述道路拓扑表示、传感器模型、世界模型，以及行业标准和关键性能指标，从而推进自动驾驶车辆部署的验证方法。 业界正在开发一套新标准——ISO 21448，被称为预期功能安全（SOTIF）。它旨在避免即使所有车辆部件都处于正常运行的状态，但依然有可能会引发风险的情况。例如，如果运行在车辆中的深度神经网络错误地识别了道路中的交通标志或物体，则即使软件没有发生故障也可能产生不安全的情况。 nvidia drive Drive OS Drive AV(a variety of DNNs) Drive Hyperion(AGX Pegasus, and sensors) Drive IX Drive Mapping Drive Constellation, a data center solution to test and validate the actual hardware/software in an AV car data factory -&gt; AI training -&gt; We also expand the use of our DNNs to support features like automatic emergency steering and autonomous emergency braking, providing redundancy to these functionalities We also define key performance metrics to measure the collected data quality and add synthetic data into our training datasets we incorporate actual sensor data from automatic emergency braking scenarios using re-simulation to help eliminate false positives. NVIDIA created the DRIVE Road Test Operating Handbook to ensure a safe, standardized on-road testing process.]]></content>
  </entry>
  <entry>
    <title><![CDATA[leetCode_swap_pairs]]></title>
    <url>%2F2020%2F11%2F21%2FleetCode-swap-pairs%2F</url>
    <content type="text"><![CDATA[backgroundLeetCode24: swap pairs intuitive sol123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899 struct ListNode &#123; int val; ListNode *next; ListNode() : val(0), next(nullptr) &#123;&#125; ListNode(int x) : val(x), next(nullptr) &#123;&#125; ListNode(int x, ListNode *next) : val(x), next(next) &#123;&#125; &#125;;#include &lt;iostream&gt;class Solution &#123;public: ListNode* swapPairs(ListNode* head) &#123; ListNode *dummy = new ListNode(-1); ListNode *prev = dummy ; prev-&gt;next = head ; ListNode *p1=head, *p2=p1-&gt;next, *p3=p2-&gt;next, *p4=p3-&gt;next, *pn, *tmp ; if(p1 == nullptr) &#123; return nullptr ; &#125;else if(p2 == nullptr)&#123; return p1 ; &#125;else if(p3 == nullptr) &#123; p2-&gt;next = p1 ; p1-&gt;next = nullptr ; head = p2 ; return head ; &#125;else if(p4 == nullptr) &#123; head-&gt;next = p2 ; p2-&gt;next = p1 ; p1 -&gt;next = p3; return head ; &#125; for(p1=head, p2=p1-&gt;next, p3=p2-&gt;next, p4=p3-&gt;next ; p4 &amp;&amp; p3 &amp;&amp; p2 &amp;&amp; p1; prev=p3, p1=prev-&gt;next, p2=p1-&gt;next, p3=p2-&gt;next, p4=p3-&gt;next )&#123; pn = p4-&gt;next ; p2-&gt;next = p1 ; p4-&gt;next = p3 ; p1-&gt;next = p4; p3-&gt;next = pn ; prev-&gt;next = p2 ; if (pn == nullptr) &#123; break; &#125;else if(pn-&gt;next == nullptr) &#123; break; &#125;else if (pn-&gt;next-&gt;next == nullptr) &#123; break; &#125;else if(pn-&gt;next-&gt;next-&gt;next == nullptr) &#123; break; &#125;&#125; if(pn == nullptr || pn-&gt;next == nullptr)&#123; return prev-&gt;next; &#125; if(pn-&gt;next-&gt;next == nullptr ) &#123; tmp = pn-&gt;next ; prev-&gt;next = pn-&gt;next ; tmp-&gt;next = pn ; pn-&gt;next = nullptr; &#125;else if(pn-&gt;next-&gt;next-&gt;next == nullptr)&#123; tmp = pn-&gt;next ; ListNode *last = tmp-&gt;next ; prev-&gt;next = tmp ; pn-&gt;next = last ; &#125; return prev-&gt;next; &#125;&#125;;int main()&#123; ListNode *head = new ListNode(1); ListNode *sec = new ListNode(2); ListNode *thd = new ListNode(3); ListNode *fth = new ListNode(4); head-&gt;next = sec ; sec-&gt;next = thd ; thd-&gt;next = fth ; Solution *sol = new Solution(); ListNode *res = sol-&gt;swapPairs(head) ; std::cout &lt;&lt; res-&gt;val &lt;&lt; ', ' &lt;&lt; res-&gt;next-&gt;val &lt;&lt; ', ' &lt;&lt; res-&gt;next-&gt;next-&gt;val &lt;&lt; std::endl; return 0;&#125;]]></content>
      <tags>
        <tag>leetCode</tag>
      </tags>
  </entry>
  <entry>
    <title><![CDATA[why Hil MiL SiL and ViL]]></title>
    <url>%2F2020%2F10%2F12%2Fwhy-Hil-MiL-SiL-and-ViL%2F</url>
    <content type="text"><![CDATA[backgroundthere are two very different teams in AD. one kind is people from top Tier1 and popular OEMs, the benefit of these guys is they are very matured at the product process management, e.g. V&amp;V and the corresponding R&amp;D process, the other pons of these guys, they have lots of engineering know-how/experience to make things work. mostly we know how the system/tool work, but compared to build the sytem/tool to work, the first knowledge is about 10%. the mature R&amp;D process is, as I see, is very valuable in a long time, which we’d say engineers getting richer as they getting older. till now, German and Janpanese top companies still have strong atmosphere to respect engineers, which keeps their engineers and their industry process management growing more and more mature. that’s a very good starting point for fresh young men, if they can join these top companies in early time. the other team is from Internet companies, they are the kind of people with philosophy: as you can image out, I can build it up. while the philosophy always become true in IT and Internet service companies, and looks promising in industry fields, as the IT infra, like nowadays, service to cloud, office work in cloud, which derives lots of cloud infra, but the core like robot operation, making vehicle driving automatically, or computer aided engineering e.t.c. requires lots of know-how, far beyond the infra. Internet teams are popular in China and US, but US still have strong engieer atmosphere like Germany, which is not in China. basically I want to say, there is no apprentice process to train fresh men to matured engineers in either companies or training organizations in China, which makes engineers here show low credit. even Internet knowledge and skills in China, often we’d say programming work are youth meals, as programmer getting older they getting poor that’s the tough situation for many young people, if the men got neither good engineering training, nor got smart and young enough to coding, he get doomed in his career. but that’s not a problem for the country neither for the companies, still Chinese workers are cheap, there are always lots of young people need bread beyond a promising career. physical or virtualthe first-principal of writing a program: make it run ok make it run correct make it run performance-well make it extensible the following is to understand why we need HiL, mil, sil, to ViL during verification and validation of an ADS product, from the physical or virtual viewpoint. physical or virtual worldcase 1) from physical world to physical sensornamely, physical road test, either in closed-field, or open roads. this is the case when we need ground truth(G.T.) system to validate the device under test(DUT) sensor, to evaluate or test the DUT’s performance boundary, failure mode e.t.c. the people can come from sensor evaluation teams, or system integration teams. case 2) from virtual world to physical sensorthe virtual world come either from digital twins, or replay from a data collector. this is hardware in loop(HiL) process, which used a lot in validate either sensor, or MCUs. when using HiL to validate MCUs, the virtual world is mimic signals e.t.c virtual sensorcase 3) from virtual world to virtual sensorvirtual sensor has three kinds basically: ideal (with noise) sensor model statisticsly satisfied sensor model physical sensor model of course, the costing is increasing from ideal to physical sensor model. ideally, the downside fusion module should be no senstive to sensors, either phyiscal sensors or virtual sensors(sensor models). the benefits of capturing virtual world by virtual sensor is so cheap, which makes it very good to training perception algorithms, when physical sensor data is expensive, which is often the reality for most companies now. there is local maximum issues with virtual sensor data to train AIs, so in reality, most team used to mix 10% real-world sensor data to improve or jump out from these local maximums. of course, virtual sensor is one fundamental element to close loop from virtual world to vehicle moving, in a sim env. physical or virutal perception to fusionhere is how to test fusion, does it work correctly, performance well, how to handle abnormal(failure) case. case 4) from physical sensor perception data to fusionphysical perception data comes in two ways: the sensor system in vehicle ground truth system(G.T.) during RD stage, the sensor system in vehicle is treated as device under test(DUT), whose result can compare with the labeling outputs from G.T. system, which help to validate DUT performance, as well as to evaluate fusion performance. in phyiscal world, ground truth sytem is usually equipped with a few higher precision Lidars, of course the cost is more expensive. another pro of physical perception data is extract edge cases, which used to enrich the scenario libs, especially for sensor perception, and fusion. during massive production stage, when the data pipeline from data collector in each running vehicle to cloud data platform is closed-loop, which means the back-end system can extract and aggregate highly volume edge case easily, then these large mount of physical perception data can drive fusion logsim. question here, edge case scenarios are these abnormal perception/fusion cases, but how to detect edge cases from highly volume data in cloud(back-end), or to design the trigger mechanism in vehicle(front end) to only find out the edge cases, is not an matured work for most teams. another question here, to evaluate fusion, requires objective ground truth, which is not avaiable in massive production stage. an alternative option is using a better performance and stable sensor as semi ground truth, such as Mobileye sensor. case 5) from virtual sensor perception data to fusionwhen in sim world, it’s easily to create a ground truth sensor, then it’s easily to check fusion output with the g.t. sensor, which is great, the only assuming here, is the sim world is highly vivid of the physical world. if not, the ground truth sensor is not useful, while obviously to build a digital twin as phyiscal as possible is not easy than create the ADS system. on the other hand, if the sim world is not used to evaluate fusion, for example, used to generate synthetic images, point cloud to train perception AI modules, which is one benefit. in summary, when evaluate and validate fusion, it requires ground truth labelling, either from physical g.t. system, or virtual g.t. sensor. 1) the g.t. system is only used for during R&amp;D stage, with a small volumn of g.t. data; for massive release stage, there is no good g.t. source yet; 2) g.t. sensor in virtual world is easy to create, but requires the virtual worls is almost physical level. second opnion, fusion evaluation is deterministic and objective. so if the fusion can validated well during RD stage, considering its performance robost, stable, there is no need to validate fusion module in massive lease. when perception/fusion edge case happens, we can study them case by case. third opinon, for anormal case, e.g. sensor failure, sensor occupied cases, also need validate during RD. planningthe evaluation of planning good or not is very subjective, while in fusion, ground truth is the criteria. so there are two sols: to make subjective goals as much quanlity as possible define RSS criterias to bound the planning results ideally, planning should be not sensitive to fusion output from physical world, or virtual world. and when come to planning verification, we asumme the fusion component is stable and verified-well, namely should not delay find fusion bugs till planning stage. case6) from sim fusion output to planningSiL is a good way to verify planning model. previously, planning engineers create test scenarios in vtd, prescan, and check does the planning algorithms work. for a matured team, there are maybe hundreds and thousands of planning related scenarios, the should-be verification process requires to regress on all these scenarios, so to automatically and accelerate this verification loop, gives the second solution: cloud SiL with DevOps, like TAD Sim from Tencent, Octopus from Huawei e.t.c. another benefits of SiL in cloud is for complex driving behavior, real vehicle/pedestrains to vehicle interactions, and especially when the scenario lib is aggregating as the ADS lifecycle continues. if the planning algorithm is AI based, then to mimic the real-human drivers driving behaivor and assign these driving behaviors to the agents in the virtual world, will be very helpful to train the ego’s planning AI model. here are two models: imitation learning and reinforcement learning. firstly we train the agent/ego driving behaivor using physical world driving data by imitation learning in a training gym; and transfer these trained model to agents and ego in the sim world, they continue driving and interaction with each other, and keep training itself to do better planning. for corner/edge cases, as planning is very subjective, and lots of long-tail scenarios are there, so physical world situations collection is very valuable. case 7) from physical fusion output to planningTesla looks be the only one, who deployed the shadow mode trigging mechanism to get planning corner cases from physical driving, and the data close loop. the large volumn of physical driving data is very useful for verification and iteration of planning: aggregate more real planning scenarios, by detecting edge cases train better driving behavior, as close as possible to humans controlcontrol is the process from planning output, e.g. acc, decel, turning-angle to physical actuator response, brake force, engine torque e.t.c there are a few common issues : nonlinear characteristic, mostly we don’t get a perfect control output as expect. e.g. from decel to brake force. the actuator has response delay, which need professional engineers train to get a good balance, but still doesn’t work as expect at all the scenarios the actuators as a whole is very complex, tuning requires. case 8) from planning to virtual vehicle modelthis is where sim requires a high-precise vehicle dynamic model, which affects the performance. but a simple vehicle dynamic does work for planing, if not requires to cosist with the physical performance. case 9) from planning to physical vehicleViL, which is another big topic]]></content>
      <tags>
        <tag>simulation</tag>
      </tags>
  </entry>
  <entry>
    <title><![CDATA[find out ADS simulation trend from top teams]]></title>
    <url>%2F2020%2F10%2F07%2Ffind-out-ADS-simulation-trend-from-top-teams%2F</url>
    <content type="text"><![CDATA[backgroundwhat’s the basic logic behind simulation in autonomous driving system(ADS) ? or why we need simulation in ADS, during AD development and even after massively produced ? how to integrate simulation in the AD closed-loop platform, namely from fleet/test data collection to valuable new features? before I can answer these questions, a few steps has went through: L2 commerical sims e.g. Matlab ADS box, vtd, prescan e.t.c. L3+ WorldSim few years ago, most ADS teams get the knowledge: AD require at least 10 billion miles verification, which can’t be done in traditional test field. so comes digitial twins, game engine based simulator, e.g. LGsim, Carla, and Tencent, Huawei etc have already made these simulator as service in their public cloud. we also try LGsim, and deploy it in k8s env, that was a good time to learn, but not really helpful in practice. L2 LogSim to assist L2 ADAS functions/algorithms/features iteration, as mentioned in system validation in l2 adas, most l2 adas features doesn’t necessarily depends on Logsim, standard benchmark test suit are enough most time. of course, with more mount of LogSim, definitely can expose more bottleneck of the existing l2 adas features, but there is a balance between costing and profit. currently, large mount of physical data based L2 adas feature optimization is not an efficient invest for most OEMs. the special case of l2 feature is false positive of AEB system, which actually requires statiscally measurement. in a common development cycle, AEB system have about 5 major versions, and a few minor versions in each major version, so each version deployment need to check its false positive statically, which make logsim is a have-to. another common usage of logsim is sensor algorithm development, e.g. Bosch Radar. before each new version of radar algorithm release, they have to check its performance statiscally through logsim. LogSim is a very good starting point to integrate simulator into the ADS closed-loop platform, while we deal with data collection, clean, uploading, storage in db, and query through data engine, and DevOps to automatically trig to run, analysis and report. system verification and validation simulation is a test pathway, that’s the original purpose of simulation. traditionally, new vehicle release has went through closed field test, which has changed during ADS age, as ADS system is open-loop, there is no test suit can do once and forever, so miles coverage and scenario coverage are the novel pathways in ADS validation, which leads to simulation. ADS regulation evalution that’s the new trend as ADS getting popular, the goverment make rules/laws to set bars for new ADS players. one obligatory way is simulation. thanks for China Automotive Research Institute(CARI) teams. the following I’d share some understanding of simulation from top AD simulation teams: QCraftOur mission at QCraft is to bring autonomous driving into real life by using a large-scale intelligent simulation system and a self-learning framework for decision-making and planning. QCraft was ex-waymo simulation team, has a solid understanding about simulation in AD R&amp;D. the logic behind QCraft: 感知是一个比较确定性的问题，如何测试和评价是非常明确的，整体的方法论也是比较清楚的 规划决策视为目前最具挑战性的问题。第一，不确定性难以衡量。现有判断规划决策做得好坏的指标是舒适度和安全性；第二，从方法论的角度来说，行业里占主流位置的规划决策方法论，整体上看与20年前相比并没有大的突破。模仿学习或强化学习的方法，在大规模实际应用时也仍然存在众多问题 许多创业公司从无到有的技术构建过程——先做好建图和定位，再做好感知，最后再开始做规划决策和仿真。 this is a very good point, as most early start-ups focus on AI based perception or planning, but gradually people realize as no destructive break-through in AI tech, the AI model evaluation is too slowly, but collecting useful data is the real bottleneck. AI model is the most cheap thing when compare to data. the other point, Predict &amp; Planning(P&amp;P) is the real challenging in L3+ ADS, when dealing with L2 adas features, P&amp;P is knowlege based, which is a 20-years-old mindset. as P&amp;P in L3+ requires an efficient way to test, then comes to simulation. 边界化难题（Cornercase），在你遇上野鸭子之前，你甚至不知道会有野鸭子的问题，所以边界化难题是需要有方法去(自动)发现新的corner case，并解决。 除了收集大量的数据，更重要的是建立自动化生产的工厂，将源源不断收集来的有效数据，通过自动化的工具，加工成可用的模型。以更快的速度、更高效的方式应对边界化难题（Corner case）。 测试工具是为了帮助工程师高效地开发，快速复现车辆上的问题，并提前暴露可能的潜在问题，同时也是提供一个评估系统，评价一个版本和另外一个版本比是变好了还是变坏了。 我们的测试系统可做到和车载系统的高度一致，在路上出现的问题，回来就能在仿真里复现，并进行修复。保证再次上路时不出现同样问题。我们产生的场景库也与现实环境高度一致，因为本来就是从现实中学习来的 轻舟智航不希望“只见树木不见森林”——通过见招拆招的方式进入到某个具体的小应用场景，变成一家靠堆人来解决问题、无法规模化的工程公司， in a word, to build a closed-loop data platform to drive ADS evolution automatically or self-evolution, that’s the real face of autonomous vehicle. QCraft: 实现无人驾驶需要什么样的仿真系统 基于游戏引擎开发的仿真软件大都“华而不实” 仿真软件在自动驾驶领域的重要应用，就是复现(replay)某一次的路测效果。但由于这种第三方软件的开发与自动驾驶软件的开发是相互独立的，很难保证其中各个模块的确定性，导致整个仿真软件存在不确定性，最终影响可用性。 轻舟智航仿真系统的系统架构可以分为5层：最底层的是轻舟智航自研的Car OS，借助底层的通讯系统来保证模块之间的高效通讯； Car OS与仿真器是高度整合的系统，核心仿真器及评估器，是基于底层的Car OS接口开发的，能保证仿真系统的确定性；再往上一层是仿真周边工具链和基础架构，可保证整个数据闭环的有效性，将全部数据高效利用起来；第四层是大规模场景库构建；最顶层则是分布式系统仿真平台，支持快速、大规模的仿真应用，在短时间内得出正确评估。 仿真场景库的自动生成的相关工作。视频中红色和绿色的两个点，分别代表两辆车的运动轨迹，这些轨迹的生成和变化，是在真实的交通数据集上，利用深度学习的方法进行训练，再使用训练好的深度神经网络 (生成模型) 合成大规模的互动车辆的轨迹 我们认为仿真是达到规模化无人驾驶技术的唯一路径。首先，借助仿真及相关工具链，能形成高效的数据测试闭环，支持算法的测试和高效迭代，取代堆人或堆车的方式；其次，只有经过大规模智能仿真验证过的软件，才能够保证安全性和可用性。 in a word, this is a very ambitious plan, from carOS to cloud end, the whole data pipeline, from fleet data collection to model training, to new features release. the missing part is who would pay for it and the most evil part is who will pay for the large mount of engineering work, which is not something feeling high for PhDs. Latent Logicanother waymo simulation startup, offers a platform that uses imitation learning to generate virtual drivers, motorists and pedestrians based on footage collected from real-life roads. Teams working on autonomous driving projects can insert these virtual humans into the simulations they use to train the artificial intelligence powering their vehicles. The result, according to Latent Logic, is a closer-to-life simulated training environment that enables AI models to learn more efficiently. training autonomous vehicles using real-life human behaviour‘Autonomous vehicles must be tested in simulation before they can be deployed on real roads. To make these simulations realistic, it’s not enough to simulate the road environment; we need to simulate the other road users too: the human drivers, cyclists and pedestrians with which an autonomous vehicle may interact.’ ‘We use computer vision to collect these examples from video data provided by traffic cameras and drone footage. We can detect road users, track their motion, and infer their three-dimensional position in the real world. Then, we learn to generate realistic trajectories that imitate this real-life behaviour.’ in a word, Latent Logic is dealing with P&amp;P training in L3+ ADS, which is the real challenging. Roman Roadsusing imitation learning to train robots to behave like human, state of the art behavioral tech. We offer R&amp;D solutions from 3D environment creation, traffic flow collection to testing &amp; validation and deployment. driving behavior data collection ego view collection, fleet road test real-time re-construction real time generation of 3d virtual env and human behavior(imitation learning) data-driven decision making pre-mapping The behavior offset are different between two cities. We collect data, like how people change lane and cut in, and learn the naturalistic behaviors at different cities. in a word, still a very solid AI team, but who would pay for it and how to get valuable data, or integrate into OEM’s data platform, is the problem. Cognatathe customer includes Hyundai Mobis. Cognata delivers large-scale full product lifecycle simulation for ADAS and Autonomous Vehicle developers. trainingAutomatically-generated 3D environments, digital twins or entirely synthetic worlds by DNN, synthetic datasets, and realistic AI-driven traffic agents for AV simulation. validation pre-built scenarios, standard ADAS/AV assessment programs fuzzing for scenario variaties for regulations and certifications UI friendly analysis Ready-to-use pass/fail criteria for validation and certification of AV and ADAS Trend mining for large scale simulation visulizationhow it worksstatic layer: digital twin dynamic layer: AI powered vehicles and pedestrains sensor layer: most popular sensor models based on DNN cloud layer: scalability and on-promise. Cognata as an independent simulation platform is so good, but when considering integration with exisitng close loop data platform in OEMs, it’s like a USA company, too general, doesn’t wet the shoe deeply, compare to QCraft, whose architecture is from down to top, from car OS to cloud deployment, which is more reliable solution for massively ADAS/AD products in future. applied intuitionanother Waymo derived simulation company. Test modules individually or all together with our simulation engine that’s custom built for speed and accuracy. improve perception algorithms, Compare the performance of different stack versions to ground truth data. Test new behaviors and uncover edge cases before pushing updates to your vehicles. Extract valuable data from real world drives and simulations Review interesting events recorded from vehicles and simulations to determine root cause issues. Share the data with other team members for further analysis and fixes. Extract and aggregate performance metrics from drive logs into automatic charts, dashboards, and shareable reports for a holistic view of your development progress. Run your system through thousands of virtual scenarios and variations to catch regressions and measure progress before rolling it out for on-road testing. in a word, looks pretty. metamotosimulation as a service, Enterprise products and services for massively scalable, safe, scenario-based training and testing of autonomous system software it’s an interet based simulator compare to preScan, but in core is just another preScan, with scalability. can’t see data loop in the arch. so it’s helpful during R&amp;D, but not realy usefuly after release. Parallel Domainpower AD with synthetic data, the ex-Apple AD simulation team. Parallel Domain claims its computing program will be able to generate city blocks less than a minute, Using real-world map data. Parallel Domain will give customers plenty of options for fine tuning virtual testing environments. The simulator offers the option to incorporate real-world map data, and companies can alter everything from the number of lanes on a simulated road to the condition of its computer-generated pavement. Traffic levels, number of pedestrians, and time of day can be tweaked as well. Nio is the launch customer of Parallel Domain PD looks to train AD in virtual reality from real map to virtual world, and road parameters are programable. but what about micro traffic flow, vehicle-pedestrain-cars interaction ? I think it’s great to generate virtual world with Parallel Domain, but not enough as the simulator in the whole close loop. the collected data, of course include real map info, which can used to create virtual world, but why need this virtual world? is to train AD P&amp;C system, which is more than just the static virtual world and with some mimic pedestrain/vehicle behaviors. in AI powered AD, valid and meaningful data is the oil. the basic understanding here is to with more data, get more robost and general AI model, which means with more data, the AI AD system can do behavior better with existing scenarios, and more importantly, do increase the ability to handle novel scenarios automatically. so is the close loop data in AD. righthook digital twin of real world scenario creation tool powered by AI to derive high-value permuation and test cases automatically(maybe both static and dynamic scenarios) vehicle configuration test management integration with DevOps and cloud on-premise in a world, this is something similar like Cognata, to generate vivid world from physical map data or synthetic data, then add imitation learning based agents, then a DevOps tool and web UI configurable. the most important and also the difficult part of this pipeline, is how to obtain large mount of valid and useful real data as cheap as possible, to train the scenario generator, as well as agents behavior generator. only MaaS taxi companies and OEMs have the data. rfproa driving simulation and digital twin for ad/adas, vd(chassis, powertrain e.t.c) development, test and validation. rFpro includes interfaces to all the mainstream vehicle modelling tools including CarMaker, CarSim, Dymola, SIMPACK, dSPACE ASM, AVL VSM, Siemens Virtual lab Motion, DYNAware, Simulink, C++ etc. rFpro also allows you to use industry standard tools such as MATLAB Simulink and Python to modify and customise experiments. rFpro’s open Traffic interface allows the use of Swarm traffic and Programmed Traffic from tools such as the open-source SUMO, IPG Traffic, dSPACE ASM traffic, PTV VisSim, and VTD. Vehicular and pedestrian traffic can share the road network correctly with perfectly synchronised traffic and pedestrian signals, while allowing ad-hoc behaviour, such as pedestrians stepping into the road. data farming: generating mimic training data the largest lib of digital twins of public roads in the world supervised learning env for perception in a word, looks like a traditionally tier2. edge case researchHologramcomplements traditional simulation and road testing of perception systems Hologram unlocks the value in the perception data that’s collected by your systems. It helps you find the edge cases where your perception software exhibits odd, potentially unsafe behavior. from recorded data, to automated edge case detection(powered by AI), The result: more robust perception most of the cost of developing safety-critical systems is spent on verification and validation. in a word, safety is a traditional viewpoint, and most AI based teams doesn’t have solid understanding. but how ECR ca foretellixout of box verification automation solution for ADAS and highway functions, developed based on input from OEMs, regulators and compliance bodies. Coverage Driven Verification, Foretellix’s mission is to enable measurable safety of ADAS &amp; autonomous vehicles, enabled by a transition from measuring ‘quantity of miles’ to ‘quality of coverage’ how it works what to test ? Industry proven verification plan and 36 scenario categories covering massive number of challenges and edge cases When are you done? Functional coverage metrics to guide completeness of testing this is ADAS L2 V&amp;V solution, mostly in functional metric test. Open Language: M-SDLM-SDL is an open, human readable, high level language that allows to simplify the capture, reuse and sharing of scenarios, and easily specify any mix of scenarios and operating conditions to identify previously unknown hazardous core &amp; edge cases. It also allows to monitor and measure the coverage of the autonomous functionality critical to prove ADAS &amp; AV safety, independent of tests and testing platforms. atlatecHD map for simualtion, digitial twin of real road, integrated well with simulation suppliers, e.g. IPG, Vires, Prescan. ivexprovides qualitatively safety assessment of planning and decision making for all levels of ADs during development. safety assessment toolwhat is safety requirements during AD development ? what’s the KPIs to represent these safety requirements ? is safety requirements iterative in each functional iteration, or done once for ever ? safety requirements can be validate before release, after then, when new corner cases detected, need to do safety validate automatically. so in this way, safety and function should keep in the same step. the ability of safety assessment tool : unsafe cases and decision detection from a large amount of scenarios a qualitative metric of safety (RSS model) statistical analysis of risk and safety metrics safety co-pilotguarantee safety of a planned trajectory. The safety co-pilot uses the safety model to assess whether (1) a situation is safe, and (2) a planned trajectory for the next few seconds can be considered as safe, accounting for predicted movements of other objects and road users. in a word, safety is big topic, but I can’t see the tech behind how Ivex solve it. nvidiaintelsummaryfrom the top ADS simulation teams, we see a few trends: simulation driven P&amp;P iteration simulation driven V&amp;V can simulation be an independent tool, or it is customized and in-house services ? maybe like CAE ages, in the early time, Ford/GM have their own CAE packages, later CAE tools are maintained by external suppliers. when ADS simulation is matured, there maybe few independent ADS simulation companies, like Ansys, Abaqus in CAE fields. or a totally differenet story I can’t image here. referSelfDriving.fyi north america incubator zhihu pegasus symposium 2019]]></content>
      <tags>
        <tag>simulation</tag>
      </tags>
  </entry>
  <entry>
    <title><![CDATA[system validation in l2 adas]]></title>
    <url>%2F2020%2F09%2F20%2Fsystem-validation-in-l2-adas%2F</url>
    <content type="text"><![CDATA[backgroundfrom system requirements, the functional models can be created. the bottom line of ADAS is safety, which defines the safety requirements. during ADAS system validation, both function modules and safety modules should be covered. my experience lay in SiL a lot. a question recently is how much percentage between SiL and HiL is a good choice? a few years early, SiL, especially world-sim, e.g. LGSVL, Carla are highly focused. the big companies said, to achive a L3+ autonmous driving system, there needs at least 10 billion miles driving test, which is almost impossible in traditional vehicle desing lifecycle, which gives a life for virtual sim driving validation. which even not consider the iteration of ADAS algorithms. as if there is no mechanism to gurantee the previous road data is valid to validate the new version of algorithms, they have to collect another 10 billion miles for each algorithm version iteration. but now most OEM choose the gradually evaluation way from ADAS to AD. during ADAS dev and test, Bosch has mature solutions, they don’t need 10 billion miles to validate AEB, ACC, LK kind of L2 ADAS functions, usually 1500 hrs driving data is enough, and they have ways to re-use these driving hrs as much as possible during their ADAS algorithms iteration. I am thinking the difference is about function-based test and scenario-based test. ADAS functions always have a few limited if-else internally and state machines in system-level. for function-based test, to cover all these if-else logic and state machines jumping, with additional safety/failure test, the ADAS system can almost gurantee to be a safe and useful product. there is no need to cover all scenarios in the world for L2 ADAS system. of course, once someone did find out all physical scenarios in the world, they can define new ADAS features. and add these new features to their products. in a word, L2 ADAS is a closed-requirement system, the designers are not responsible for any cases beyond the designed features. For L3 AD system, the designers is responsible for any possible scenarios, which is a open-requirements system, there is no gurantee the design features is enough to cover all scenarios in real world, so which need scenario-based test and keep evaluating through OTA. For L2 ADAS, OTA is not a have-to option in some way, but L3 ADAS has to OTA frequently. OTA for l2 adas is to improve the performance of known features, through analysis of big driving data, but it doesn’t discovery novel features. OTA for l3 adas need to improve the performance of known features, as well as to discover new features as much as possible and as quick as possible. which of course bring requirement of OEM’s data platform. Tesla is currently the only OEM, who can close the data loop, from customer’s driving data, to OEM’s cloud, and OTA new/better features back to customers. there is no doubt, Tesla evaluation is scenario-based. collecting scenarios is not only for validate existing systems, or test purpose, but more for discover new features, e.g. from lane/edestrain/vehicle detection to traffic light detection e.t.c. the traditional OEMs still don’t know how to use scenario data valuablely, only image the collected data can do validate or calibrate a better performance known features. ADAS HiLHiL helps to validate embedded software on ECUs using simulation and modeling teches. at the bottom of V model, there are software dev and test, the upper right is software/hardware integration test, namely HiL. as I understand, the purpose of HiL is software and hardware integration test, and there are two main sub validation: safety requirements, and function requiements. to validate safety in HiL, fault injection is used. safety module is very bottom-line, and any upper functional module is rooted from safety module. in a narraw view, fault injection only focus on known or system-defined faults, which is finite cases; in a wide view, any upper functional cases can reach to a fault injection test case, which is almost unlimited. in the narraw view, coverage of functions(both function internnal state machine and across-function jump trig mechanism) is iterable, even though its total size may be hundreds of thousands. ADAS massively production solution1R1V(3R1V) Level 2 ADAS system, some OEMs called it as highway Assist(HWA), mostly provided by Tier1, e.g. Bosch. 5R1V(5R5V) Level 2+ ADAS system, some OEMs called it as HighWay Pilot(HWP), hidden meaning it’s L3, but in fact, not a fully L3. 8R1V(5R12V) camera-priority solution, used in Tesla and Mobileye. L2 ADAS system validationfrom traditional vehicle system test and validation, there are finite cases, e.g. ESP interfaces. for ADAS system validation, which is scenario based, and mostly there are infinite cases. there are different ways to cover ADAS system validation: fault injection test, the vehicle powertrain, ADAS sensors, ADAS SoC/MCU, all these HW has random probability to fail. Fault Injectionfault injection tests is used to increase the test coverage of Safety Requirements. Within the fault injection tests arbitrary errors are introduced into the system to proof the safety mechanisms that have been encoded in the software to examine. fault injection tests is only a subset of Requirements based testing. ISO 26262-4 [System] describes the Fault Injection Test as follows:The test uses special means to introduce faults into the item. This can be done within the item via a special test interface or specially prepared elements or communication devices. The method is often used to improve the test coverage of the safety requirements, because during normal operation safety mechanisms are not invoked. The ISO 26262-4 [System] defines the fault injection test as follows: Fault Injection Test includes injection of arbitrary faults in order to test safety mechanisms (e.g. corrupting software or hardware components, corrupting values of variables, by introducing code mutations, or by corrupting values of CPU registers). arbitrary faults is actually from a known list of errors. fault injection is good at testing known sorts of bugs or defect, but poor at testing novel faults or defects, which are precisely the sorts of defects we would want to discover. therefore, fault injection in ADAS is to verify the designed safety mechanism/responses. to verfication of the system. if an ADAS system is designed to tolerate a certain class of faultss, then these faults can be directly injected into the system to examine their responses. for errors which is too infrequent to effectively test in the field, fault injection is powerful to acclerate the occurence of faults. in summary, fault injection is more verification tool, rather than a tool to improve performance or find novel design faults. software verification and validationverification is about whether there is the functions as required. validation is about how good the functions are as required. verfication tech includes: dynamic testing and static testing. e.g. funcitonal test, random test; validation tech includes: (h/sw) fault injection, dependency analysis, hazard analysis, risk analysis e.t.c. the software verification is usually a combination of review, analyses, and test. review and analyses are performed on the following components: requirement analysis software arch source code outputs of integration process test cases and their procedures and results for test is to demonstrate it satisfies all requirements and to demonstrate that errors leading to unacceptable failure conditions has safe strategies. usually including: h/sw integration test: to verify the software is operating correctly in the hardware software integration test low-level test CANapemeasurement device plugin into CANape box under a special channel group the signals need recorded triger strategy to start recording calibrationvisulizationCANoereferADAS/AD dev: L2+ ADAS/AD sensor arch ADAS/AD dev: SoC chips solution ADAS/AD dev: massive production solutions using Fault Insertion Units for electronic testing Fault injection test in ISO 26262 – Do you really need it paper: Fault Injection in automotive standard ISO 26262: an initial approach fault injection from CMU EE verfication/validation/certification from CMU EE the challenege of ADAS HiL integrated ADAS HiL system with the combination of CarMaker and various ADAS test benches Vector: online and offline validation of ADAS ECUs]]></content>
      <tags>
        <tag>adas</tag>
      </tags>
  </entry>
  <entry>
    <title><![CDATA[adas data logging solution know how]]></title>
    <url>%2F2020%2F08%2F18%2Fadas-data-logging-solution-know-how%2F</url>
    <content type="text"><![CDATA[Data replay offers an excellent way to analyze driving scenarios and verify simulations based on real-world data. This can be done with HIL systems as well as with data logging systems that offer a playback mechanism and software to control the synchronized playback. Captured data can be replayed in real time or at a slower rate to manipulate and/or monitor the streamed data. An open, end-to-end simulation ecosystem can run scenarios through simulations via a closed-loop process. overcoming logging challenges in ADAS dev pdf: ada logging solution from Vector confidently record raw ADAS sensor data for drive tests from NI online and offline validation of ADAS ECUs pdf: data recording for adas development from Vector pdf: time sync in automotive Ethernet mobileye fleets solutions pdf: QNX platform for ADAS 2.0 time sync in modular collaborative robots pdf: solving storage conundrum in ADAS development and validation validation of ADAS platforms(ECUs) with data replay test from dSPACE RTMaps from dSpace vADASdeveloper from vector date driven road to automated driving ASAM OpenSCENARIO doc open loop HiL for testing image processing ECUs github: open simulation interface sensor models from Pegasus scenario analysis and quality measures from Pegasus pdf: Pegasus method an overview]]></content>
      <tags>
        <tag>logging</tag>
        <tag>adas</tag>
      </tags>
  </entry>
  <entry>
    <title><![CDATA[autosar know-how]]></title>
    <url>%2F2020%2F08%2F18%2Fautosar-know-how%2F</url>
    <content type="text"><![CDATA[the following are knowledges from Internet, if copyright against, please contact me. autosar: 微控制器抽象层 MCAL微控制器驱动 GPT Driver WDG Driver MCU Driver Core test 存储器驱动通信驱动 Ethernet驱动 FlexRay 驱动 CAN 驱动 LIN 驱动 SPI 驱动 I/O驱动AUTOSAR：基础软件层AUTOSAR软件体系结构包含了完全独立于硬件的应用层（Application Layer）和与硬件相关的基础软件层（BasicSoftware,BSW），并在两者中间设立了一个运行时环境（Run Time Environment），从而使两者分离，形成了一个分层体系架构。 基础软件层组件 系统，提供标准化(os, timer, error)规定和库函数 内存，对内、外内存访问入口进行标准化 通信，对汽车网络系统、ECU间、ECU内的通信访问入口进行标准化 I/O， 对传感器、执行器、ECU的IO进行标准化 服务层与复杂驱动系统服务比如， os定时服务、错误管理。为应用程序和基础软件模块提供基础服务。 存储器服务通信服务通信服务通过通信硬件抽象与通信驱动程序进行交互 复杂驱动复杂驱动（CCD）层跨越于微控制器硬件层和RTE之间，其主要任务是整合具有特殊目的且不能用MCAL进行配置的非标准功能模块。复杂驱动程序跟单片机和ECU硬件紧密相关。 Autosar time sync11 Time sync对于自适应平台，考虑了以下三种不同的技术来满足所有必要的时间同步需求： 经典平台的StbM 库chrono -要么std::chrono (c++ 11)，要么boost::chrono 时间POSIX接口 TBRs充当时间基代理，提供对同步时间基的访问。通过这样做，TS模块从“真实的(real)”时基提供者中抽象出来。 autoSar: time sync protocol specificationPrecision Time Protocol (PTP) generalized Precision Time Protocol (gPTP) Time Synchronization over Ethernet, IEEE802.1AS time master: is an entity which is the master for a certain Time Base and which propagates this Time Base to a set of Time Slaves within a certain segment of a communication network. If a Time Master is also the owner of the Time Base then he is the Global Time master. time slave: is the recipient for a certain Time Base within a certainsegment of a communication network, being a consumer for this Time Base time measurement with Switches: in a time aware Ethernet network, HW types of control unit exists: Endpoints directly working on a local Ethernet-Controller Time Gateways, time aware bridges, where the local Ethernet-Controller connects to an external switch device. A Switch device leads to additional delays, which have to be considered for the calculation of the corresponding Time Base specification of time sync over EthernetGlobal Time Sync over Ethernet(EthTSyn) interface with: Sync time-base manager(StbM), get and set the current time value Ethernet Interface(EthIf), receiving and transmitting messages Basic Software Mode Manager(BswM), coord of network access Default Error Tracer(DET), report of errors A time gateway typically consists of one Time Slave and one or more TimeMasters. When mapping time entities to real ECUs, an ECU could be Time Master (or even Global Time Master) for one Time Base and Time Slave for another Time Base. time sync in automotive EthernetThe principal methods for time synchronization in the automotive industry are currently based on AUTOSAR 4.2.2, IEEE 802.1AS, and the revised IEEE 802.1AS-ref, which was developed by the Audio/Video Bridging Task Group, which is now known as the TSN (Time Sensitive Networking) Task Group. The type of network determines the details of the synchronization process. For example, with CAN and Ethernet, the Time Slave corrects the received global time base by comparing the time stamp from the transmitter side with its own receive time stamp. With FlexRay, the synchronization is simpler because FlexRay is a deterministic system with fixed cycle times that acts in a strictly predefined time pattern. The time is thus implicitly provided by the FlexRay clock. While the time stamp is always calculated by software in CAN, Ethernet allows it to be calculated by either software or hardware]]></content>
  </entry>
  <entry>
    <title><![CDATA[mcu know-how]]></title>
    <url>%2F2020%2F08%2F18%2Fmcu-know-how%2F</url>
    <content type="text"><![CDATA[basic knowledgeSPI protocolSPI, Serial Peripheral Interface。使MCU与各种外围设备以串行方式进行通信以交换信息。SPI总线可直接与各个厂家生产的多种标准外围器件相连，包括FLASHRAM、网络控制器、LCD显示驱动器、A/D转换器和MCU等。 SPI接口主要应用在EEPROM、FLASH、实时时钟、AD转换器， ADC、 LCD 等设备与 MCU 间，还有数字信号处理器和数字信号解码器之间，要求通讯速率较高的场合。 TAPI protocolTAPI（电话应用程序接口）是一个标准程序接口，它可以使用户在电脑上通过电话或视频电话与电话另一端的人进行交谈。TAPI还具备一个服务提供者接口（SPI），它面向编写驱动程序的硬件供应商。TAPI动态链接库把API映射到SPI上，控制输入和输出流量。 MCU timingSoftware design is easy when the processor can provide far more computing time than the application needs. in MCU, the reality is often the opposite case. the following are some basic know-how, before we can jump into ADAS MCU/SOC issues. scheduling sporadic and aperiodic events in a hard real-time systemA common use of periodic tasks is to process sensor data and update the current state of the real-time system on a regular basis. Aperiodic tasks are used to handle the processing requirements of random events such as operator requests. An aperiodic task typically has a soft deadline. Aperiodic tasks that have hard deadlines are called sporadic tasks. Background servicing of aperiodic requests occurs whenever the processor is idle (i.e., not executing any periodic tasks and no periodic tasks are pending). If the load of the periodic task set is high, then utilization left for background service is low, and background service opportunities are relatively infrequent. However, if no aperiodic requests are pending, the polling task suspends itself until its next period, and the time originally allocated for aperiodic service is not preserved for aperiodic execution but is instead used by periodic tasks foreground-background scheduling periodic tasks are considered as foreground tasks sporadic and aperiodic tasks are considered as background tasks foreground tasks have the highest priority and the bc tasks have lowest priority. among all highest priority, the tasks with highest priority is scheduled first and at every scheduling point, highest priority task is scheduled for execution. only when all foreground tasks are scheduled, bc task are scheduled. completion time for foreground task for fg task, their completion time is same as the absolute deadline. completion time for background task when any fg task is being excuted, bc task await. let Task Ti is fg task, Ei is the amount of processing time required over every Pi period. Hence， 1Avg. CPU utilization for Ti is Ei/Pi if there are n periodic tasks(fg tasks), e.g. T1, T2, … Tn then total avg CPU utilization for fg taskes: 1fg_cpu_util = E1/P1 + E2/P2 + ... + En/Pn then the avg time available for execution of bg task in every unit of time is: 11 - fg_cpu_util let Tb is bg task, Eb is the required processing time, then the completion time of bg task: 1= Eb / (1 - fg_cpu_util) in ADAS system, usually there exist bost periodic tasks(fg) and aperiodic tasks(bc), and they are corelated, fg tasks always have higher priority than bc tasks. so there is chance when a certain fg task rely on output of another bc task, which is not or partially updated due to executing priority, then there maybe some isues. referVector: Functional safety and ECU implementation autosar OS measures task execution times 3min看懂mcu 设计定时任务系统]]></content>
      <tags>
        <tag>mcu</tag>
      </tags>
  </entry>
  <entry>
    <title><![CDATA[verfication and validation in ADAS dev]]></title>
    <url>%2F2020%2F08%2F18%2Fverfication-and-validation-in-ADAS-dev%2F</url>
    <content type="text"><![CDATA[Verification and Validation in V-modelverification whether the software conforms the specification finds bugs in dev output is software arch e.t.c QA team does verification and make sure the software matches requirements in SRS. it comes before validation validation it’s about test, e.g. black box test, white box test, non-functional test it can find bugs which is not catched by verification output is an actual product V-model is an extension of waterfull model. The phases of testing are categorised as “Validation Phase” and that of development as “Verification Phase” how simulation helps ? for validation to cover known and rara critical scenarios. for verification to test whether the component function as expected model based to data driven methodlogy completeness known 0-error-gurantee iterable example model based 1 1 1 0 NASA data driven 0 0 0 1 Tesla Motor with traditionally vehicle/airspace development, the system team first clearly define all requirements from different stakeholders and different departments, which may take a long time; then split to use cases, which is garanteed to be complete and all-known set, and garantee zero-error, the engineers just make sure each new development can meet each special verification. the general product design cycle is 3 ~ 5 years, and once it release to market, there is little chance to upgrade, except back to dealer shops. model based mindset need to control everything at the very beginning, to build a clear but validatable super complex system at very first, even it takes long long time. and the development stage can be a minor engineering process with a matured project driven management team. the goods of model based mindset is it strongly gurantee known requirements(including safety) is satisfied, while the drawback is it lose flexibility, slow response to new requirements, which means lose market at end. to validate model based design ADS system, usually it highlights the full-cover or completeness of test cases, which usually can be pre-defined in semantic format(.xml) during system requirement stage. then using a validation system(e.g. a simulator) to parse the semantic test cases, and produce, run and analysis the test scenarios automatically. in data driven ADS system, the system verification and validation strongly depends on valid data. 1) data quantity and quality. as the data set can cover as many as plenty kinds of different scenarios, it’s high qualitied； while if the data set have millions frames of single lane high way driving, it doesn’t teach the (sub)system a lot really. 2) failure case data, which is most valueable during dev. usually the road test team record the whole data from sensor input to control output, as well as many intermediate outputs. what means a failure case, is the (sub)system (real) output doesn’t correspond to expected (ideal) output, which is a clue saying something wrong or miss-handled of the current-version ADS system. of course, there are large mount of normal case data. 3) ground truth(G.T.) data, to tell the case as either normal or failure, needs an evaluation standard, namely GT data. there are different ways to get GT data in RD stage and massive-producing stage. in RD stage, we can build a GT data collecting vehicle, which is expensive but highly-valued, of course it’s better to automatically generate G.T. data, but manually check is almost a mandatory. a low-efficient but common way to get GT data by testing driver’s eye. after data collection, usually there is off-line data processing with the event log by the driver. so the offline processing can label scenarios as normal case or failure case, usually we care more about failure case. in massive-producing stage, there is no GT data collecting hardware, but there is massive data source to get a very high-confidence classifer of failure or normal. 4) sub-system verification, is another common usage of data, e.g. fusion module, AEB module, e.t.c. due to the limitation of existing sensor model and realistic level of SiL env, physical sensor raw data is more valued to verify the subsystem, which including more pratical sensor parameters, physical performance, physical vehicle limitation e.t.c, compared to synthetic sensor data from simulator, which is either ideal or statistically equal, or too costing to reach physical-level effect. 5) AI model training, which consume huge data. in RD stage, is difficult to get that much of road data. so synthetic data is used a lot, but has to mixed a few portion of physical road data to gurantee no over-fit with semi-data. on the other hand, tha’s a totally different story if someone can obtain data from massive-producing fleet, as Telsa patented:SYSTEM and METHOD for obtaining training data: An example method includes receiving sensor and applying a neural network to the sensor data. A trigger classifier is applied to an intermediate result of the neural network to determine a classifier score for the sensor data. Based at least in part on the classifier score, a determination is made whether to transmit via a computer network at least a portion of the sensor data. Upon a positive determination, the sensor data is transmitted and used to generate training data. which is really an AI topic to learn from massive sensor data to understand a failure case. 6) AI model validation, validation should depends on labeled valid dataset, e.g. G.T. data, or data verified from existing system, e.g. some team trust mobileyes output as G.T. data. 7) (sub)system validation SiL sematic driventhis mostly correspond to model based dev, there are a few issues about sematic driven SiL: 1) build realistic-close sensor model, but how realistic it is compared to the real physical sensor ? 80% ? 2) virtual env from the SiL simulator, based on the virtual modeling ability. 3) 1) + 2) –&gt; synthetic sensor raw data, which may have about 60%~80% realistic, compared to road test recording data 4) is there system bias of virtual/synthetic world ? during RD road test, we can record the failure scenario as semantic metadata(such as kind of OpenX/Python script), as well record the whole sensor &amp; CAN data. with semantic metadata, we import it to the SiL simulator, inside which create a virtual world of the senario. if the sensor configuration (include both intrinsic and extrinsic) in simulator is the same as the configurations in real test vehicles, and our sensor model in simulator can behave statistically equal to the real sensors, check sensor statistical pars, so it’s almost a satistically realistic sensor model. sematic scenario description framework(SSDF) is a common way to generate/manage/use verification(functional and logic) scenarios libs and validation (concrete) scenarios libs. the benefits about SSDF is a natural extension of V-model, after define system requirements, and generate test cases based on each user case, namely sematic scenarios. but as we mentioned above, how precision the SiL performance and especially when comes to statiscally equal sensor model, namely, how to validate the accuracy loss or even reliability loss gap between synthetic and real envs, which is usually not considered well in sematic scenario based SiL. no doubt, synthetic data, or pure semantic scenario has its own benefits, namely fully labeled data, which can be used to as ground truth in virutal world or as input for AI training. again, we need to confirm how much realiabitliy these labeled data are, before we can 100% trust them. Ideal ground truth/probabilistic sensor models are typically validated via software-in-the-loop (SIL) simulations. Phenomenological/physical sensor models are physics-based. They are based on the measurement principles of the sensor (i.e. camera uptakes, radar wave propagation) and play a role simulating phenomena such as haze, glare effects or precipitation. They can generate raw data streams, 3-D point clouds, or target lists. sensor statistically evaluation sensor accuracy model, kind of obeying exponential distribution, along side with the distance in x-y plane. further the sensor accuracy is speed、scenario etc depends. sensor detection realiability(including false detection, missing detection), kind of obeying normal distribution , further can speicify obstacle detection realiability and lane detection realiability. sensor type classification realiability, kind of normal distribution sensor object tracking realiability, kind of normal distribution vendor nominal intrinsic, there is always a gap from vendor nominal intrinsic to the sensor real performance. e.g. max/min detection distance, FOV, angle/distance resolution ratio e.t.c. so we can use the test-verified sensor parameters as input for sensor model, rather than the vendor nominal parameters. as mentioned, there are lots of other factors to get a statistically equal sensor model, which can be considered iteration by iteration. the idea above is a combination of statistical models, if there is a way to collect sensor data massively, a data-driven machine learning model should be better than the combination of statistical models. data is the new oilat the first stage, we see lots of public online data set, especially for perception AI training, from different companies e.g. Cognata, Argo AI, Cruise, Uber, Baidu Apollo, Waymo, Velodyne e.t.c. for a while, the roadmap to AI model/algorithms is reached by many teams, the real gap between a great company and a demo team, is gradually in build a massive data collection pipeline, rather than simply verify the algorithm works. the difference is between Tesla and the small AI team. this will be a big jump from traidional model based mindset to data-driven mindset, including data driven algorithm as well as data pipeline. in China, the data center/cloud computing/5G is called new infrastructure, which definitely accelerate the pocessing of building up the data pipline from massively fleet. referedifference of V vs V what are false alerts on a radar detector surfelGAN: synthesizing realistic sensor data for AD from DeepAI data injection test of AD through realistic simulation avsimulation BMW: development of self driving cars understand.AI. a dSPACE company cybertruck talk tesla club]]></content>
  </entry>
  <entry>
    <title><![CDATA[mf4 know how]]></title>
    <url>%2F2020%2F07%2F07%2Fmf4-know-how%2F</url>
    <content type="text"><![CDATA[preparationpip reinstall uninstall current pip apt-get install python-pip #which install the default 8.1 version sudo -H pip2 install –upgrade pip #which upgrade to 20.1 version 1pip install --install-option="--prefix=$PREFIX_PATH" package_name why does pip3 say I am using v8.1.1, however version 20.1.1 is avail install asammdfgithub: asammdf it’s recommended to use conda env to manage this project. name this env as mdf. it’s recommended to install packages through conda install, rather than system apt-get install. the following packages is required: 1234567conda install numpyconda install pandasconda install -c conda-forge dbusconda install lxml=4.5.0conda install -c conda-forge canmatrixconda install -c conda-forge asammdfconda install pyqt5 #optionally but recommended, if you need asammdf GUI tool in Linux take care the pkg installed path, either globally in conda/pkgs/ or in the special env /conda/envs/mdf/lib/python3/site-packages, can simplely import asammdf to see $PYTHONPATH found the module. conda pythonpathcheck existing system path. 123import syssys.path ['', '/usr/lib/python3/dist-packages', '/home/anaconda3/envs/mf4/lib/python3.8/site-packages'] PYTHON loads the modules from the sys.path in the order, so if PYTHON find the required module in the first PATH, which however is the wrong version, then it’s an error. does anaconda create a separate PYTHONPATH for each new env: each environment is a completely separate installation of Python and all the packages. there’s no need to mess with PYTHONPATH because the Python binary in the environment already searches the site-packages in that environment, and the libs of the environment. in one word, when using conda, don’t use system PYTHONPAH install asammdf[gui] test with PyQt5 12python3 import PyQt5 reports Could not load the Qt platform plugin “xcb” in “” even though it was found the reason is due to libqxcb.so from ~/anaconda3/envs/aeb/lib/python3.6/site-packages/PyQt5/Qt/plugins/platforms not found libxcb-xinerama.so.0, fixed by install libxcb-xinerama0 123ldd libqxcb.so $ libxcb-xinerama.so.0 =&gt; not foundsudo apt-get install libxcb-xinerama0 install 12export PYTHONPATH=""pip install asammdf[gui] 123Requirement already satisfied: pyqtgraph==0.11.0rc0; extra == "gui" in /home/gwm/anaconda3/envs/aeb/lib/python3.6/site-packages (from asammdf[gui]) (0.11.0rc0)Requirement already satisfied: psutil; extra == "gui" in /home/gwm/anaconda3/envs/aeb/lib/python3.6/site-packages (from asammdf[gui]) (5.7.0)Requirement already satisfied: PyQt5&gt;=5.13.1; extra == "gui" in /home/gwm/anaconda3/envs/aeb/lib/python3.6/site-packages (from asammdf[gui]) (5.15.0) numpy utils output precision 12345678res = np.where(arr==roi)for x in np.nditer(arr): print(x)np.set_printoptions(precision=3)np.set_printoptions(suppress=True)np.around([], decimals=2)for (k,v) in dict.items(): print (k, v) asammdfbypass non-standard msgmost OEM mdf4 files are collected by Vector CANape tools, which may include many uncommon types, such as Matlab/Simulink objects, measurement signals(multimedia) cam-stream e.t.c, which can’t be parsed by current asammdf tool. so a simple fix is to bypass these uncommon data type, submit in the git issue. bytes object to numerical valuesbytes objects basically contain a sequence of integers in the range 0-255, but when represented, Python displays these bytes as ASCII codepoints to make it easier to read their contents. Because a bytes object consist of a sequence of integers, you can construct a bytes object from any other sequence of integers with values in the 0-255 range, like a list: 12bVal = bytes([72, 101, 108, 108, 111])strVal = bVal.decode('utf-8') in asammdf, the numerical ndarray can be stored as uint8, uint32, uint64 e.t.c, but with a different range and representation. for radar/camera detected obj id, the range is in [0~255], so here need output as uint8. sample of ObjID as asammdf.Signal: 12345678910111213141516&lt;Signal MRR_ObjID_3: samples=[b'' b'' b'' ... b'' b'' b''] timestamps=[ 0.08185676 0.13073605 0.18073289 ... 57.5313583 57.5813593 57.63134464] invalidation_bits=None unit="" conversion=None source=&lt;asammdf.blocks.source_utils.Source object at 0x7f79b91c7ae8&gt; comment="&lt;CNcomment&gt;&lt;TX/&gt;&lt;address byte_count="1" byte_order="BE"&gt;0x0008&lt;/address&gt;&lt;/CNcomment&gt;" mastermeta="('t', 1)" raw=False display_name= attachment=()&gt; mf4_readerfirst, we need define a high-level APIs to handle collected mf4 from road test. which often includes a bunch of groups for each sensor, and a few channels to record one kind of Signal in one sensor. and another dimension is time, as each Signal is a time serial. the sample Signal output also shows there are two np.narray: samples and timestamps. 123-&gt; group ----&gt; channel--------&gt; samples[timestampIndex] what we need is to repackage the signals from mf4 as structures of input(mostly like sensor packages) and output(another structure or CAN package) to any model required, e.g. fusion/aeb. init_multi_read(group_name, channel_nums, obj_nums, channel_list) channel_nums, gives the number of channels/signals for this sensor obj_nums, is the number of objects can detected by a special sensor, which is given by the vendor of the sensor. e.g. Bosch 5th Radar can detect at most 32 objects. channel_list, is the name list of the channels, whose length should equal channel_nums. this API does read all required channels raw data into memeory initially. updateCacheData(group_name, time) time is the given timestamp, this API returns the interested samples at the given time. one trick here is time matching, as very possiblly, the given time doesn’t match any of the recorded timestamp in a special channel, here always seek the most closest timestamp to time. seek_closest_timestamp_index(timestamps, time) get_channel_data_by_name(channel_name, time) get_channel_all_data_by_name(channel_name) 1234567891011121314151617181920from asammdf import MDF4import numpy as npclass mf4_reader(): def __init__(self, mf4_file): self.reader = MDF4(mf4_file) self.channelValsCacheMap = dict() def init_multi_read(group_name, channel_name, obj_nums, channel_namelist): channelValuesMap = dict() for i in range(obj_nums): for base_name in channel_namelist: channel_name = base_name + str(i+1) channel_raw_data = self.get_channel_all_data_by_name(channel_name) if channel_raw_data : channelValuesMap[channel_name] = channel_raw_data else: channelValuesMap[channel_name] = None self.channelValsCacheMap[group_name] = channelValuesMap fusion adapteronce we can read all raw signals from mf4 files, then need to package these signals as the upper-level application requires. e.g. fusion, aeb input structs. taking fusion module as example. the input includes radar structure, camera structure, e.t.c, something like: 12345678class radar_sensor_pack(): self.objID, self.objExistProb , self.objDistx, self.objDisty, self.objRelVelx, self.objRelVely, ... data collection is done by Vector CANape, the sensor pack is defined in *.dbc file, so here basically does package dbc signals to a Python sensor_object, and then assign this sensor_object with the values from mdf4 files. we need define a bunch of fusion_adapter to package all the necessary inputs for each module, e.g. fusion, aeb e.t.c rosAdapteranother kind of usage from mf4 file, is transfer mf4 to rosbag. due to most ADS dev/debug/verifcation tools currently are based on ros env. on the other hand, ros base data collection is not robost enough than CANape, so the data collection is in *.mf4. build ros message and define ros_sensor_obj as asammdf reader is in python, here can build ros message into python module, as mentioned prevously. as ros message built is with catkin_make, which is based on python2.7, so need add the sensor_msgs python module path to $PYTHONPATH in the conda env fusion. 1export PYTHONPATH=/home/gitlab_/mf4io/linux/toRosbag/catkin_ws/install/lib/python2.7/dist-packages basically, we use python3 shell, but we add the catkin_ws/python2.7 to its PYTHONPATH. write rosbag then we can fill in ros_sensor_obj from mf4 structures. and write to bag. 1self.bag.write('/bose_cr/radar/front_right/object', ros_sensor_obj.data) rosbag know-how rosbag filter 1rosbag filter 20200619-xx.bag only-mrr.bag "topic==`/bose_mrr/radar/front/object`" read message from a bag file referelxml canmatrix asammdf what does a b prefix before a python string mean numpy data type write rosbag api record and play back data ros_readbagfile.py]]></content>
      <tags>
        <tag>mdf</tag>
      </tags>
  </entry>
  <entry>
    <title><![CDATA[so ViL, now]]></title>
    <url>%2F2020%2F06%2F28%2Fso-ViL-now%2F</url>
    <content type="text"><![CDATA[backgroundusually Software-in-Loop(SiL) test and verification is defined to cover 90% of all ADS scenarios, then about 9% is test in HiL or ViL, the last 1% is on the real road. so we are on the way to product, so here is ViL. why ViL ?Vehicle-in-Loop(ViL) is the last second step in evaluation and verification before releasing a new vehicle product. as ViL is close to the real car system. the benefits is to offer the real vehicle dynamic, which means the real vehicle respond and delay e.t.c. for functions/features development, ViL also helps to calibrate the boundary of some system parameters, which saves a lot of energy of calibration engineers, as they don’t need to the test field and prepare and run the real vehicles day by day. how ViL works ?the logic of ViL is to integrate the real vehicle (dynamic) into a virtual test environment(VTE): forward: the real vehicle send vehicle dynamic data(vehicle position, heading, speed e.t.c) to VTE; backward: VTE split out env information to the real vehicle, which is handled by SoC in vehicle. so first need prepare a VTE, which has the ability to support external vehicle dynamic(VD) plugin, which is a common feature of VTE, e.g. lgsvl full model interface secondly, VTE has the ability to export sensor data, which includes exporting data type and exporting data protocol. the common solution is rosbag through ros communication. ViL in realitythe ideal way of plugin real vehicle dynamic into VTE is throuh full-model-interface(FMI), which is especially true for SiL. for forward data, vehicle dynamic data here only requies real vehicle position information, which can be obtained in a few ways. from vehicle CAN bus, or from RTK sensor, or from upper application layer, e.g. vehicle status module. but most VTE normally support ros message parsing, rather than CAN message parsing. so additionaly need to define a CAN-ROS message adapter. for backward data, most VTE(e.g. LG) does have some kind of message channels, e.g. sensor ros topices, which can be used to publish the virtual environment information(sensor data). or VTE may have well-supported sensor message exporter. limitationsthe ViL solution above is an useable and very customized solution, so the extenable ability and data/model precision is limited.]]></content>
      <tags>
        <tag>simulation</tag>
      </tags>
  </entry>
  <entry>
    <title><![CDATA[where are you in next 4 years(3)]]></title>
    <url>%2F2020%2F06%2F28%2Fwhere-are-you-in-next-4-years-3%2F</url>
    <content type="text"><![CDATA[temparately, we have high and low, and of course whether you are a safe-well person tells a lot. I rejected SxxCVxxW, as the interchange doesn’t tell any difference, the only thing is a new brand and a new office. I am looking for some really changes. but what’s that ? I still can’t say, maybe recently I am almost lost the focus. things happen recently: Tesla of course, is leading the rocket rising, with new features in AutoPilot, and new vehicle models always on the way. then Zoox is purchused by Amazon with 1.2 billion USD, how these solution-focused start-ups survive become really interesting now; in China, the top 3 new energy start-ups, namely, Nio, LiXiang, XiaoPeng are beyond self-driving solution lonely, but car makers. and their vehicle model is not that productive, but kind of found a way to survive in Chiese market; the MaaS service providers, such as Didi, Ponyai, WeRide, AutoX are still on the way, and push really harsh to make some noise in Shanghai, Guangzhou and other cities. of course, Waymo, as the top one MaaS service provider globally is also finding its way to break through, just bind to Volvo. individuals and teams are working really hard to find out the way, but where is the way, or at least what’s the right direction in self-driving in next 5 ~ 10 years. technically, there may be still lots of engineering and research work to do; in bussiniess side, looks car-maker teams are really making some progress, just like Tesla, or other traditional OEMs, they follow the gradually evaluation solution to OTA upgrade the vehciles little by little. However, it’s just unbelievable to belive OEM teams can lead such a great disruptive innovation in future human life, rather than these innovative and culture-diverse high-tech start-up teams. there is a foundamental paradox between bussiness and creativity. this is a capital driven bussiness world, every new product is born to create a new needs for human consumers, even more and more needs are unnecessary or anti-human, does this make kind of creativities sound unhuman or irrationaly? I don’t want to answer this question, as I am in one of these teams, and I am even eager to learn how to make success through these creativities. anyway, I am staying in an OEM supported start-up now, and running on the way to product some model next year. I don’t know where is the way later, and I am afraid this half year is not easy.]]></content>
  </entry>
  <entry>
    <title><![CDATA[play with boost.python 2]]></title>
    <url>%2F2020%2F06%2F27%2Fplay-with-boost-python-2%2F</url>
    <content type="text"><![CDATA[backgroundin 1, we have a basic work flow of boost.python. in reality, the C++ project may have multi headers, where define many structs/classes, and include nested structs. here we give an simple example of nested struct/class. api_header.hpp123456789typedef struct &#123; float time; float dx ; float dy ;&#125; AEB;typedef struct &#123; AEB out1 ;&#125; OP; aeb.hpp1234567891011121314151617#include &lt;iostream&gt;#include &lt;string&gt;#include "api_header.hpp"class aeb &#123; public: aeb() &#123;&#125; OP op_out ; void test_op()&#123; (void) memset((void *)&amp;op_out, 0, sizeof(OP)); op_out.out1.time = 1.0 ; op_out.out1.dx = 2.0 ; op_out.out1.dy = 3.0 ; AEB o1 = op_out.out1 ; std::cout &lt;&lt; o1.time &lt;&lt; std::endl ; &#125;&#125;; for each of them, we can write a separate wrapp files. api_wrapper.cpp1234567891011121314151617#include &lt;Python.h&gt;#include &lt;boost/python.hpp&gt;#include "api_struct.h"using namespace boost::python;BOOST_PYTHON_MODULE(api_wrapper) &#123; class_&lt;AEB&gt;("AEB", "aeb struct") .def_readwrite("time", &amp;AEB::time) .def_readwrite("dx", &amp;AEB::dx) .def_readwrite("dy", &amp;AEB::dy) ; class_&lt;OP&gt;("OP", "OP struct") .def_readwrite("out1", &amp;OP::out1);&#125; aeb_wrapper.cpp1234567891011121314#include &lt;Python.h&gt;#include &lt;boost/python.hpp&gt;#include "student.h"using namespace boost::python;BOOST_PYTHON_MODULE(aeb) &#123; scope().attr("__version__") = "1.0.0"; class_&lt;aeb&gt;("aeb", "a class of aeb ADAS") .def(init&lt;&gt;()) .def(init&lt;std::string, int&gt;()) .def_readwrite("op_out", &amp;aeb::op_out) .def("test_op", &amp;aeb::test_op, "test op")&#125; compared to previous blog, we had def_readwrite() to include the nested struct in aeb python module, which is dynamic linked during python runtime, so which requires to import api_header module first, then import aeb module. build &amp;&amp; test123g++ -I/home/anaconda3/envs/aeb/include/python3.6m -I/usr/local/include/boost -fPIC wrap_api.cpp -L/usr/local/lib/ -lboost_python36 -shared -o wrap_api.sog++ -I/home/anaconda3/envs/aeb/include/python3.6m -I/usr/local/include/boost -fPIC wrap_student.cpp -L/usr/local/lib/ -lboost_python36 -shared -o student.so 1234567import wrap_apiimport aebs = aeb.aeb()s.test_op()a = s.op_out.out1 a.timea.dx summarythis basic workflow is enough to package Matlab/Simulink ADAS models C++ code to Python test framework.]]></content>
      <tags>
        <tag>python</tag>
        <tag>boost</tag>
      </tags>
  </entry>
  <entry>
    <title><![CDATA[boost.python 1]]></title>
    <url>%2F2020%2F06%2F26%2Fboost-python-1%2F</url>
    <content type="text"><![CDATA[backgroundin OEM’s ADS team, there are bunch of model-based design engineers, who build the ADAS features based on Matlab/Simulink tools, which is good to build quick demos, when comes to massive data verification, we can’t really depends on Matlab solver, which is so slow and licensed. so a common idea is to recompile the Matlab/Simulink model to C/C++ code, which can further embedded to more open envs, e.g. python or C++. as previously mentioned, we had designed a rq based massive data driven test framework, so the gap from C++ ADAS code to this python test framework is fixed in this blog. there are a few wasy to integrate C++ code to Python, one is Boost.Python: setupinstall boost configuring Boost.BUild python env : 3.6 (from conda env aeb)gcc: 4.5.0 1export BOOST_BUILD_PATH=`pwd` #where we keep `user-config.jam` user-config.jam12345678910111213141516using gcc : 5.4.0 : /usr/bin/g++ ;using python : 3.6 : "/home/anaconda3/envs/aeb/bin/python" : "/home/anaconda3/envs/aeb/include" : "/home/anaconda3/envs/aeb/include/python3.6m" ; ``` bootstrap will find `user-config.jam` from $BOOST_BUILD_PATH. ```sh cd /path/to/boost_1_73_0./bootstrap.sh --help./bootstrap.sh --prefix=/usr/local/ --show-librariesb2 --with-python --prefix="/usr/local/" install variant=release link=static address-model=64 b2 --clean a sample of user-config.jam error fixing 12fatal error: pyconfig.h: No such file or directorycompilation terminated. need export CPATH=~/anaconda/envs/aeb/include/python3.6m/, where located pyconfig.h and other headers finally report: boost.python build successfully ! demo runthe following is simple sample of how to use boost_python wrapper to wrapping an AEB model(in c++) to python aeb.h1234567891011121314151617181920212223242526#include &lt;iostream&gt;#include &lt;string&gt;typedef struct &#123; float time; float dx ; float dy ;&#125; AEB;typedef struct &#123; AEB out1 ;&#125; OP;class aeb &#123; public: Student() &#123;&#125; OP op_out ; void test_op()&#123; (void) memset((void *)&amp;op_out, 0, sizeof(OP)); op_out.out1.time = 1.0 ; op_out.out1.dx = 2.0 ; op_out.out1.dy = 3.0 ; AEB o1 = op_out.out1 ; std::cout &lt;&lt; o1.time &lt;&lt; std::endl ; &#125;&#125;; wrap_aeb.cpp12345678910111213141516171819202122232425262728293031323334353637383940#include &lt;Python.h&gt;#include &lt;boost/python.hpp&gt;#include &lt;boost/python/suite/indexing/vector_indexing_suite.hpp&gt;#include "aeb.h"using namespace boost::python;BOOST_PYTHON_MODULE(aeb) &#123; scope().attr("__version__") = "1.0.0"; scope().attr("__doc__") = "a demo module to use boost_python."; class_&lt;aeb&gt;("aeb", "a class of aeb") .def(init&lt;&gt;()) .def("test_op", &amp;aeb::test_op, "test op") .def_readonly("op_out", &amp;aeb::op_out)&#125;``` tips, if aeb.h and aeb.cpp are separated files, it's better to merge them first; for nested structure in wrapper is another topic later.#### build and python import * check header location Python.h @ `/home/anaconda3/envs/aeb/include/python3.6m` boost/python @ `/usr/local/include/boost`* check .so lib location /usr/local/lib/tips, if there is duplicated `boost lib` in system, e.g. `/usr/lib/x86_64-linxu-gnu/libboost_python.so` which maybe conflict with `boost_python` install location at `/usr/local/lib/libboost_python36.so`* build ```shg++ -I/home/anaconda3/envs/aeb/include/python3.6m -I/usr/local/include/boost -fPIC wrap_aeb.cpp -L/usr/local/lib/ -lboost_python36 -shared -o aeb.so test 123import aebt = aeb.aebt.test_op() summarythis blog gives the basic idea how to use boost.python to integrate c++ to python test framework. there are plenty details need fixed, e.g. nested structures, share_pointers. maybe share in next blog. refereboost.python tutorial]]></content>
      <tags>
        <tag>python</tag>
        <tag>boost</tag>
      </tags>
  </entry>
  <entry>
    <title><![CDATA[play with ros 2]]></title>
    <url>%2F2020%2F06%2F22%2Fplay-with-ros-2%2F</url>
    <content type="text"><![CDATA[ros msgsros msgs usually used in C++, as our ADS data tool is implemented in Python, I’d try to build *.msg to python module. find two blogs from ROS doc: writing a ROS python Makefile create a ros msg to build msg into python module, the package.yml should at least include the following lines: 123456&lt;buildtool_depend&gt;catkin&lt;/buildtool_depend&gt;&lt;build_depend&gt;message_generation&lt;/build_depend&gt;&lt;build_export_depend&gt;rospy&lt;/build_export_depend&gt;&lt;build_export_depend&gt;std_msgs&lt;/build_export_depend&gt;&lt;exec_depend&gt;rospy&lt;/exec_depend&gt;&lt;exec_depend&gt;std_msgs&lt;/exec_depend&gt; the default CMakeList.txt looks like: 12345678910111213141516find_package(catkin REQUIED COMPONENTS roscpp rospy std_msgs message_generation) add_message_files( FILES custom.msg)generate_messages( DEPENDENCIES std_msgs) the generated msg python module is located at ~/catkin_ws/devel/lib/python2.7/dist-packages/my_msg_py/msg,which can add to $PYTHONPATH for later usage create ros node with catkinfirst check your $ROS_PAKCAGE_PATH, the default pkg path is /opt/ros/kinetic/share, append custom pkgs path from ~/cakin_ws/devel/share. 123cd ~/my_catkin_ws/srccatkin_create_pkg my_pkg [dependencies, e.g. sd_msgs rospy]rospack find catkin_create_pkg will create a CMakeList.txt at pkg level, and a src folder, where can hold custom nodes definition. sensor serial data to ros nodesensors(e.g. rtk, imu) to ros is communication from external world to ros sys. Things need to take care: mostly sensor hardware device doesn’t support ROS driver directly, so first need device serial or CAN or Ethernet to get the raw sensor data, and package it as sensor/raw_msg to publish out; the real ros-defined sensor node, will subscribe sensor/raw_msg and publish the repackaged sensor/data to the ros system, (which usually happened in ros callback). 1234567891011121314151617181920def rtk_cb(std_msgs::ByteMultiArray raw_msg): rtk_msg = func(raw_msg) pub.publish(rtk_msg)pub = nodeHandler.advertise&lt;sensor_msgs:NavSatFix&gt;("/rtk_gps/data", 1)sub = nodeHandler.subscribe("/rtk_gps/raw_data", 1, rtk_cb)def raw_data_generator(): try: with open("/dev/ttyS0", "r|w") as fd: header = read(fd, buf, header_line) while ros::ok(): content = read(fd, buf, content_lines) std::msgs::ByteMultiArray raw_msg raw_msg.data.push_back(header) raw_msg.data.push_back(content) pub.publish(raw_msg) close(fd) except: print("failed to read raw data\n") sensor CAN data to ros nodesensors(such as camera, radar, lidar e.t.c) go to ros sys through Veh CAN Bus. the difference between CAN msg and serial msg is data atomicity. as serial msg is only one variable, which gurantee atomicity in application level; while each CAN frame usually include a few variables, which need custom implement atomicity in application level. 12345678910111213141516thread0 = pthread_create(recv_thread, raw_can_data)thread1 = pthread_create(send_thread)def thread0(): recv_data = func(raw_can_data) pthread_mutex_lock(mutex_lock) sensor_raw = recv_data sem_post(sem_0) pthread_mutex_unlock(mutex_lock)def thread1(): pthread_mutex_lock(mutex_lock) sensor_ros_msg = sensor_raw pthread_mutex_unlock(mutex_lock) node_.publish(sensor_ros_msg) ros node to external deviceanother kind of communication, is from ros system to external device/env, such as dSPACE. the external device, if not communication through serial, then usually support Ethernet(udp/tcp), which then need to implement a custom udp/tcp data recv/send func. the ros node subscribe the necessary data, then send it out through udp/tcp. 12345def cb(sensor_msg): data = repack(sensor_msg) udp.send(data)sub = nodeHandler.subscribe("/useful/data", 1, cb) xml-rpc &amp;&amp; tcprosthe ros sytem has two communication, to register/update ros node, publish/subscribe topics to ros master. this kind of message go through xml-rpc. after worker nodes registered in master node, the P2P communication can generated, and the data is transfered through tcpros. each ros node has a xml-rpc server, in code, nodeHandler.advertise() called in publisher/subscriber node, to register their topices to ros master. once a subscribe node register to master, which topics it subscribed, master returns a URI as response, then the subscriber and publisher can build connection through this URI. when a publish node register to master, master call publisherUpdate() to notify all subscriber, who subscribe topices from this publisher. ros visual(rviz)ros-rviz how rviz works ? If you want to create a node providing a set of interactive markers, you need to instantiate an InteractiveMarkerServer object. This will handle the connection to the client (usually RViz) and make sure that all changes you make are being transmitted and that your application is being notified of all the actions the user performs on the interactive markers. rviz rosbag rviz config can customize the rviz display, the default located at ~/.rviz/default.rviz. the idea to play rosbag and render in rviz is to define a custom node, to receive the custom pkg_msg from replayed rosbag, then repckage pkg_msg as corresponded marker/markerArray, then publish these msg out, which will be received by rviz usually we define a custom node to receive replayed topics from rosbag.play(), and define a callback func to publish its marker objects out. 12345678910ros::Publisher markerArray def pkg_cb(sensor_pkg): for objIdx in sensor_pkg.ObjNum: prepare_marker(marker, sensor_pkg.objects[objIdx] SensorDisplay.markers.append(marker) markerArray.publish(SensorDisplay) SensorDisplay-&gt;markers.clear()subPkg = nodeHandler.subscribe("sensor_pkg", 1, pkg_cb);markerArray = nodeHandler.advertise&lt;visulization_msgs::MarkerArray&gt;("sensor_pkg", 1) ros summaryros is a very common communciation way and message type in ADS dev, many demo are implemented based on ros, which gives a bunch of ros related tools. in this blog, we review three of them: ros based sensor device data collection ros rviz rosbag.play which can support a few kinds of applications, e.g. debuging, replay, data collection.]]></content>
      <tags>
        <tag>ros</tag>
      </tags>
  </entry>
  <entry>
    <title><![CDATA[redis queue(rq) in data processing]]></title>
    <url>%2F2020%2F06%2F04%2Fredis-queue-rq-in-data-processing%2F</url>
    <content type="text"><![CDATA[backgroundrecently thinking about how to design middleware and frameworks in self-driving team, just like the back-end service in most web servicees, which gives the way to go uppper and go abstract. currently many start-up ADS team, espcially Internet-based start-ups, have built-in ADS frameworks and middlewares, to support their daily dev and easily to implement new features. here is an old topic, redis task queue, how to design a robost data pipeline to support ADAS/ADS functions virtual test with physical collected road data. the experience to put lgsvl into swarm/k8s brings the idea about micro-services, which is a great way to decouple a large work to a few small pieces of independent but communicatable services. so when coming to handle large data set, which is very common in ADS dev. so the first idea is to decouple the data pipeline as a few micro-services: reader service, process service, post-analysis service e.t.c then two questions immediately up: which data/message type fits, which network communication protocol fits.and beyond these two basic questions, also need a lot work about message adapter among/in services. previously, I designed websocket and json solution. but it’s too tedious to plug in a ws-server/client at the front/end of each service, especially as the number of serivces grows. take it back, data pipeline is a heavy data IO work, is it really smart to split the work into a few pieces, then find the network communication among them ? we increase the system complex by introducing additional network communcation moduels, and the only benefit is decouple a heavy data IO work. and more, the network modules need consider cache, job scheduler, load balance issues, as the data process service may take much longer than reader services. traditionally, heavy data IO work is common run in batch processing, disregarding network issues, and it’s better to run directly in memory/cache. so I go to rq interprocess communication in distributed systemMPI is the standard for IPC in HPC apps, of course there are Linux IPC libs, which brings more low-level ipc APIs. MPI apps mostly run on high performance computing cluster, which has the samilar API e.g. Allreduce as Hadoop/MapReduce, while the difference MPI/allReduce doesn’t tolerate failure, which means any node failed, the MPI apps failed. Which is the foundmental difference from HPC to distributed system nowadays, really popular as the new infrastructure for cloud and AI. in the distributed system, there are a few ways to do interprocess communication: RESTful protocol, such as TCP, UDP, websocket. async communication, there are different ways to implement async interprocess communication, one way is message queue, of course many language, e.g. js, go have some light-weight libs/framework to support ansyc communication interprocessly. rpc, thrift is an Apache project, grpc is high efficient with protobuf, but it doesn’t support well service discovery/load balance mechanism inside, which is a limitation in cloud-native applications. dubbo has a better design for service discovery and load balance, the message type by default is json. so all of these can be the corner-stone service in modern micro service envs. also the common micro-service framework, e.g. Spring Cloud has interprocess communication component as well. for data hungry services, batch processing frameworks, e.g. Spring Batch, Linux Parallel should also consider. rqthe following is from rq doc Queuesa job is a Python object, namely a function that is invoked async in a worker process. enqueueing is simply pushing a reference to the func and its ars onto a queue. we can add as many Queue instance as we need in one Redis instance, the Queue instance can’t tell each other, but they are hosted in the same redis instance, which gives the way to find jobs binding to Queue1 in worker2 from Queue2 jobs123job1 = q.enqueue(my_func, func_args)job2 = Job.create(my_func, ttl=100, failure_ttl=10, depends_on=, description=, func_args)q.enqueue_job(job2) timeout: specifies the max runtime of job before it’s interrupted and marked as failed. ttl: specifies the maximum queued time(in sec) of the job before it’s dscarded. default is None(infinite TTL) failure_ttl: specifies how long(in sec) failed jobs are kept(default to 1 years) the following sample is a way to find all rq:job:s, but the return is a bytes object, which need encode as utf-8 for any further usage. 12345import redisr = redis.StrictRedis()r.keys()for key in r.scan_iter("rq:job:*"): print(key.encode('utf-8') workersworkers will read jobs from the given queues(the order is important) in an endless loop, waiting for new work to arrive when all jobs done. each worker will process a single job at a time. by default, workers will start working immediately and wait until new jobs. another mode is burst, where to finish all currently avaiable work and quit asa all given queues are emptied. rq worker shell script is a simple fetch-fork-execute loop connectionswhen you want to use multiple connections, you should use Connection contexts or pass connections around explicitly. 12345conn1 = Redis('localhost', 6379)conn2 = Redis('remote.host.org', 9836)q1 = Queue('foo', connection=conn1)q2 = Queue('bar', connection=conn2) Every job that is enqueued on a queue will know what connection it belongs to. The same goes for the workers.within the Connection context, every newly created RQ object instance will have the connection argument set implicitly. 12345def setUp(self): push_connection(Redis())def tearDown(self): pop_connection() this should be the way to handle distributed queues. resultsif a job returns a non-None value, the worker will write that return value back to the job’s Redis hash under result key. the job’s Redis hash itself expire in 500sec by default after the job is finished. 12q.enqueue(foo, result_ttl=86400) # result expires after 1 dayq.enqueue(func_without_rv, result_ttl=500) # job kept explicitly when an exception is thrown inside a job, it’s caught by the worker, serialized and stored under exc_info key. By default, jobs should execute within 180 seconds. After that, the worker kills the work horse and puts the job onto the failed queue, indicating the job timed out. 1q.enqueue(mytask, args=(foo,), kwargs=&#123;'bar': qux&#125;, job_timeout=600) # 10 mins job registrieseach queue maintains a set of Job Registries. e.g. StartedJobRegistry, FinishedJobRegistry e.t.c. we can find these after log in redis-cli version bugwhen run rq demo, it reports: 12raise RedisError("ZADD requires an equal number of "redis.exceptions.RedisError: ZADD requires an equal number of values and scores manually change /rq/registry.py: 12 # return pipeline.zadd(self.key, &#123;job.id: score&#125;)return pipeline.zadd(self.key, job.id, score) data pipeline for ADS function verification queues each queue instance can be taken as a separate namespace in the Redis instance, so the workers only process the jobs in the same queue. but if multi-queues are hosted in the same Redis instance, then Redis api can find Queue A’s jobs in Queue B’s workers. 1234conn = Redis() mf4q = Queue('mf4Q', connection=conn)aebq = Queue('aebQ', connection=conn)dbq = Queue('dbQ', connection=conn) jobs if the handler_fun has return values, namely status. rq store its status at job.result, thle lifecycle of which can be controlled by result_ttl, e.t.c. to control the order of running the jobs, jobs can have depend. 12345678def mf4_jober(url_path): mf4_job = mf4q.enqueue(mf4_reader, args=(url_path,), timeout=60, ttl=60, failure_ttl=1, job_timeout=60, result_ttl=60, job_id=mf4_job_id)def aeb_jober(mf4_frames): aeb_job = aebq.enqueue(aeb_oneStep, args=(i_, ), timeout=60, ttl=20, failure_ttl=1, result_ttl=10, job_id=aeb_job_id) def db_jober(aeb_frame, idx): db_job= dbq.enqueue(db_oneStep, args=(aeb_frame,), timeout=60, ttl=20, failure_ttl=1, job_id=db_job_id) workers 12345678def mf4_workers(conn, num_workers=1): for i in range(num_workers): worker_ = Worker([mf4q], connection=conn, name=worker_name) workers_.append(worker_) for w in workers_: w.work(burst=True)def aeb_workers()def db_workers() runners 12345678def runner(conn): mf4_workers(conn) for k in conn.scan_iter("rq:job:mf4_job_*"): t_ = k.decode('utf-8') j_ = Job.fetch(t_[7:], connection=conn) aeb_jober(j_.result) input("hold on ...") aeb_workers(conn) since the output of mf4 jobs is the input of aeb, so we need runners, similar for db. summary rq is a good framework for this level data pipleine. for even bigger and complex system, rq maybe just a middleware, and there should be an even larger framework. the software engineering idea about framework and middleware in a large system gradually become the foundataion of ADS team in distributed system, there are a few basic concepts distributed consensus, the popular choice e.g. zookeeper, etcd; interprocess communication, distributed cache e.t.c. really cool to know. referxiaorui blog: rq 微服架构中的进程间通信 微服务架构 使用gRPC构建微服务 微服务, 通信协议对比 理解批处理的关键设计 spring batch 批处理]]></content>
      <tags>
        <tag>redis</tag>
      </tags>
  </entry>
  <entry>
    <title><![CDATA[Linux iptables]]></title>
    <url>%2F2020%2F05%2F23%2FLinux-iptables%2F</url>
    <content type="text"><![CDATA[background从iptable详解整理的。 理解iptables，是为了更好的理解k8s中网络。 从逻辑上讲。防火墙可以大体分为主机防火墙和网络防火墙。 主机防火墙：针对于单个主机进行防护。 网络防火墙：往往处于网络入口或边缘，针对于网络入口进行防护，服务于防火墙背后的本地局域网。 iptables其实不是真正的防火墙，我们可以把它理解成一个客户端代理，用户通过iptables这个代理，将用户的安全设定执行到对应的”安全框架”中，这个”安全框架”才是真正的防火墙，这个框架的名字叫netfilter Netfilter是Linux操作系统核心层内部的一个数据包处理模块，它具有如下功能： 网络地址转换(Network Address Translate) 数据包内容修改 以及数据包过滤的防火墙功能 规则一般的定义为”如果数据包头符合这样的条件，就这样处理这个数据包”. 规则分别指定了源地址、目的地址、传输协议（如TCP、UDP、ICMP）和服务类型（如HTTP、FTP和SMTP）等. 当数据包与规则匹配时，iptables就根据规则所定义的方法来处理这些数据包，如放行（accept）、拒绝（reject）和丢弃（drop）等。配置防火墙的主要工作就是添加、修改和删除这些规则。 如果我们想要防火墙能够达到”防火”的目的，则需要在内核中设置关卡，所有进出的报文都要通过这些关卡，经过检查后，符合放行条件的才能放行，符合阻拦条件的则需要被阻止，于是，就出现了input关卡和output关卡，而这些关卡在iptables中不被称为”关卡”,而被称为”链”。当客户端发来的报文访问的目标地址并不是本机，而是其他服务器，当本机的内核支持IP_FORWARD时，我们可以将报文转发给其他服务器。这个时候，我们就会提到iptables中的其他”关卡”，也就是其他”链”，他们就是 “路由前”、”转发”、”路由后”，他们的英文名是 PREROUTING、FORWARD、POSTROUTING 当我们定义iptables规则时，所做的操作其实类似于”增删改查”。 “关卡“/”链“ 包括： prerouting, INPUT, OUTPUT, FORWARD, POSTROUTING 表： filter 负责过滤(iptables_filter); nat(network address translation), 网络地址转发， mangle, raw 表名： 1iptables -t filter -L (INPUT/OUTPUT/FORWARD/DOCKER-USER/DOCKER-ISOLATION-STAGE/KUBE-EXTERNAL-SERVICES/KUBE-FIREWALL/UBE-FORWARD/KUBE-KUBELET-CANARY/KUBE-SERVICES) -t选项，查看表名(filter/mangle/nat)-L 选项，规则/链名-n 选项，表示不对IP地址进行名称反解，直接显示IP地址。–line-number 选项， 显示规则的编号 规则大致由两个逻辑单元组成，匹配条件与动作。 测试1：拒绝某个远程主机(10.20.180.12)访问当前主机(10.20.181.132) 1iptables -t filter -I INPUT -s 10.20.180.12 -j DROP -I 选项, 插入规则到哪个链 -s 选项，匹配条件中的源地址 -j 选项，当匹配条件满足，采取的动作 测试1.1：拒绝某个远程主机(10.20.180.12)访问当前主机(10.20.181.132)，再追加一条接受(10.20.180.12)的访问 1iptables -A INPUT -s 10.20.180.12 -j ACCEPT -A选项，追加规则到某个链。 不通 即规则的顺序很重要。如果报文已经被前面的规则匹配到，iptables则会对报文执行对应的动作，即使后面的规则也能匹配到当前报文，很有可能也没有机会再对报文执行相应的动作了。 测试2: 根据规则编号去删除规则 12iptables --line -vnL INPUTiptables -t filter -D INPUT N -D选项，删除某条链上的第N条规则 测试2.2: 根据具体的条件去执行删除规则 12iptables -vnL INPUTiptables -t filter -D INPUT -s 10.20.180.12 -j DROP 修改规则，一般就是删除旧规则，再添加新规则。 保存规则 当重启iptables服务或者重启服务器以后，我们平常添加的规则或者对规则所做出的修改都将消失，为了防止这种情况的发生，我们需要将规则”保存” 12iptables-save &gt; /etc/network/iptables.up.rulesiptables-apply #restart iptables 更多关于匹配条件 -s 选项， 指定源地址作为匹配条件，还可以指定一个网段，或用 逗号分割多个IP; 支持取反。 -d 选项，指定目标地址作为匹配条件。 不指定，默认就是（0.0.0.0/0），即所有IP -p 选项，指定匹配的报文协议类型。 12iptables -t filter -I INPUT -s 10.20.180.12 -d 10.20.181.132 -p tcp -j REJECTssh 10.20.180.12 #from 10.20.181.132 ssh t0 10.20.180.12 suppose to be rejected. but not ? -i选项，匹配报文通过哪块网卡流入本机。 iptables之网络防火墙当外部网络中的主机与网络内部主机通讯时，不管是由外部主机发往内部主机的报文，还是由内部主机发往外部主机的报文，都需要经过iptables所在的主机，由iptables所在的主机进行”过滤并转发”，所以，防火墙主机的主要工作就是”过滤并转发” 主机B也属于内部网络，同时主机B也能与外部网络进行通讯，如上图所示，主机B有两块网卡，网卡1与网卡2，网卡1的IP地址为10.1.0.3，网卡2的IP地址为192.168.1.146: c主机网关指向B主机网卡1的IP地址；A主机网关指向B主机网卡2的IP地址。 on hostmachine A: 123route add -net 10.1.0.0/16 gw 192.168.1.146 ping 10.1.0.1 # ping machine C, not availableping 10.1.0.3 # ping machine B(NIC 2), avaiailable 为什么10.1.0.1没有回应。 A主机通过路由表得知，发往10.1.0.0/16网段的报文的网关为B主机，当报文达到B主机时，B主机发现A的目标为10.1.0.1，而自己的IP是10.1.0.3，这时，B主机则需要将这个报文转发给10.1.0.1（也就是C主机），但是，Linux主机在默认情况下，并不会转发报文，如果想要让Linux主机能够转发报文，需要额外的设置，这就是为什么10.1.0.1没有回应的原因，因为B主机压根就没有将A主机的ping请求转发给C主机，C主机压根就没有收到A的ping请求，所以A自然得不到回应 为什么10.1.0.3会回应。 这是因为10.1.0.3这个IP与192.168.1.146这个IP都属于B主机，当A主机通过路由表将ping报文发送到B主机上时，B主机发现自己既是192.168.1.146又是10.1.0.3，所以，B主机就直接回应了A主机，并没有将报文转发给谁，所以A主机得到了10.1.0.3的回应 如何让LINUX主机转发报文 check /proc/sys/net/ipv4/ip_forward. if content is 0, meaning the Linux hostmachine doesn’t forward; if content is 1, meaning the Linux hostmachine does forward. or 1sysctl -w net.ipv4.ip_forward=1 for permenentally allow Linux host forward, update file /etc/sysctl.conf. 设置linux主机转发报文后， A ping C || C ping A should works. 如果我们想要使内部的主机能够访问外部主机的web服务，我们应该怎样做呢？ 我们需要在FORWARD链中放行内部主机对外部主机的web请求. 在B主机上： 12iptables -I FORWARD -j REJECTiptables -I FORWARD -s 10.1.0.0/16 -p tcp --dport 80 -j ACCEPT B主机上所有转发(forward)命令，都REJECT。只ACCEPT 内网10.1.0.0/16 网段， 端口80的转发FORWARD. （即内网网段可访问外网)。 C ping A (ok) destnation port 目标端口。 想让A ping C, 需在B主机iptables中再添加如下规则： 1iptables -I FORWARD -d 10.1.0.0/16 -p tcp --sport 80 -j ACCEPT source port 源端口。 配置规则时，往往需要考虑“双向性”。因为一条规则(forward)，只会匹配最新的规则定义。上述修改完了，A 可以PING C，但C又PING不通A了。 a better way, no matter request from in to out or the othersize, both should FORWARD. 1iptables -I FORWARD -m state --state ESTABLISHED,RELATED -j ACCEPT 更多动作 NAT, network address translation. 网络地址转换。NAT说白了就是修改报文的IP地址，NAT功能通常会被集成到路由器、防火墙、或独立的NAT设备中。那为什么要修改报文的IP地址呢？ scenario1 :网络内部有10台主机，它们有各自的IP地址，当网络内部的主机与其他网络中的主机通讯时，则会暴露自己的IP地址，如果我们想要隐藏这些主机的IP地址，该怎么办呢？ 当网络内部的主机向网络外部主机发送报文时，报文会经过防火墙或路由器，当报文经过防火墙或路由器时，将报文的源IP修改为防火墙或者路由器的IP地址，当其他网络中的主机收到这些报文时，显示的源IP地址则是路由器或者防火墙的，而不是那10台主机的IP地址，这样，就起到隐藏网络内部主机IP的作用。同时路由器会维护一张NAT表，记录这个内部主机的IP和端口。当外部网络中的主机进行回应时，外部主机将响应报文发送给路由器，路由器根据刚才NAT表中的映射记录，将响应报文中的目标IP与目标端口再改为内部主机的IP与端口号，然后再将响应报文发送给内部网络中的主机。 刚才描述的过程中，”IP地址的转换”一共发生了两次。 内部网络的报文发送出去时，报文的源IP会被修改，也就是源地址转换：Source Network Address Translation，缩写为SNAT。 外部网络的报文响应时，响应报文的目标IP会再次被修改，也就是目标地址转换：Destinationnetwork address translation，缩写为DNAT。 不论内网访问外网，Or the otherwise。都会有上述两次IP转换。一般将内网请求外网服务称为snat，外网请求内网服务称为dnat. 上述场景不仅仅能够隐藏网络内部主机的IP地址，还能够让局域网内的主机共享公网IP，让使用私网IP的主机能够访问互联网。比如，整个公司只有一个公网IP，但是整个公司有10台电脑，我们怎样能让这10台电脑都访问互联网呢？ 只要在路由器上配置公网IP，在私网主机访问公网服务时，报文经过路由器，路由器将报文中的私网IP与端口号进行修改和映射，将其映射为公网IP与端口号，这时，内网主机即可共享公网IP访问互联网上的服务了 场景2：公司有自己的局域网，网络中有两台主机作为服务器，主机1提供web服务，主机2提供数据库服务，但是这两台服务器在局域网中使用私有IP地址，只能被局域网内的主机访问，互联网无法访问到这两台服务器，整个公司只有一个可用的公网IP。如何让公网访问到公司的内网服务呢？ 将这个公网IP配置到公司的某台主机或路由器上，然后对外宣称，这个IP地址对外提供web服务与数据库服务，于是互联网主机将请求报文发送给这公网 IP地址，也就是说，此时报文中的目标IP为公网IP，当路由器收到报文后，将报文的目标地址改为对应的私网地址，比如，如果报文的目标IP与端口号为：公网IP+3306，我们就将报文的目标地址与端口改为：主机2的私网IP+3306，同理，公网IP+80端口映射为主机1的私网IP+80端口，当私网中的主机回应对应请求报文时，再将回应报文的源地址从私网IP+端口号映射为公网IP+端口号，再由路由器或公网主机发送给互联网中的主机。 测试环境，同ABC主机。 SNAT 测试 on B hostmachine: 1iptables -t nat -A POSTROUTING -s 10.1.0.0/16 -j SNAT --to-source 192.168.1.146 SNAT规则只能存在于POSTROUTING链与INPUT链中。 “–to-source”就是SNAT动作的常用选项，用于指定SNAT需要将报文的源IP修改为哪个IP地址。此处，即B的公网IP. DNAT测试 on B machine: 12iptables -t nat -F #flash nat table iptables -t nat -I PREROUTING -d 192.168.1.146 -p tcp --dport 3389 -j DNAT --to-destination 10.1.0.6:3389 -j DNAT –to-destination 10.1.0.6:3389”表示将符合条件的报文进行DNAT，也就是目标地址转换，将符合条件的报文的目标地址与目标端口修改为10.1.0.6:3389。 理论上只要完成上述DNAT配置规则即可，但是在测试时，只配置DNAT规则后，并不能正常DNAT，经过测试发现，将相应的SNAT规则同时配置后，即可正常DNAT，于是我们又配置了SNAT: 1iptables -t nat -A POSTROUTING -s 10.1.0.0/16 -j SNAT --to-source 192.168.1.146 MASQUERADE当我们拨号网上时，每次分配的IP地址往往不同，不会长期分给我们一个固定的IP地址，如果这时，我们想要让内网主机共享公网IP上网，就会很麻烦，因为每次IP地址发生变化以后，我们都要重新配置SNAT规则，这样显示不是很人性化，我们通过MASQUERADE即可解决这个问题。 MASQUERADE会动态的将源地址转换为可用的IP地址，其实与SNAT实现的功能完全一致，都是修改源地址，只不过SNAT需要指明将报文的源地址改为哪个IP，而MASQUERADE则不用指定明确的IP，会动态的将报文的源地址修改为指定网卡上可用的IP地址。 1iptables -t nat -I POSTROUTING -s 10.1.0.0/16 -o en1 -j MASQUERADE 通过B外网网卡出去的报文在经过POSTROUTING链时，会自动将报文的源地址修改为外网网卡上可用的IP地址，这时，即使外网网卡中的公网IP地址发生了改变，也能够正常的、动态的将内部主机的报文的源IP映射为对应的公网IP。 可以把MASQUERADE理解为动态的、自动化的SNAT REDIRECT使用REDIRECT动作可以在本机上进行端口映射 1iptables -t nat -A PREROUTING -p tcp --dport 80 -j REDIRECT --to-ports 8080 经过上述规则映射后，当别的机器访问本机的80端口时，报文会被重定向到本机的8080端口上。REDIRECT规则只能定义在PREROUTING链或者OUTPUT链中。 referiptables essentials ip route, ip rule &amp; iptables 知多少]]></content>
      <tags>
        <tag>Linux</tag>
      </tags>
  </entry>
  <entry>
    <title><![CDATA[k8s: flannel]]></title>
    <url>%2F2020%2F05%2F23%2Fk8s-flannel%2F</url>
    <content type="text"><![CDATA[backgroundk8s networks include a few topics: pod to pod communication in k8s pod to the host node communication pod to external service URL external request to pod in k8s the networks have two components: DNS and iptables. DNS used to resovle URL name to IP; iptables used to control network message transfer in/out. preparation image for testto test network inside pod/docker, we need add the following network tools: 12345iputils-ping \ net-tools \ iptables \iproute docker/pod runtime privilegesby default, docker doesn’t allow run iptables inside container. and it give errors: 123root@redisjq:/# iptables -t nat -L | grep INPUT iptables v1.6.0: can't initialize iptables table `nat': Permission denied (you must be root)Perhaps iptables or your kernel needs to be upgraded. which need to add docker runtime privilege and Linux capabilities container capabilities in k8s In a Kubernetes pod, the names are the same, but everything has to be defined in the pod specification. When implementing this in Kubernetes, you add an array of capabilities under the securityContext tag. 1234securityContext: capabilities: add: - NET_ADMIN k8s pod DNSDNS for services and pods introduced four types: None Default， where POD derived DNS config from the host node where to run pod. ClusterFirst， where POD use DNS info from kube-dns or coreDNS ClusterFirstWithHostNet, as the name explained. tips, Default is not the default DNS policy. If dnsPolicy is not explicitly specified, then ClusterFirst is used as default. the purpose of pod/service DNS is used to transfer URL to IP, which is the second step after iptabels is understand successfully. coreDNS is setted during kube-adm init with serverDNS. To do SNAT, the pod/service in K8S need access resolv.conf/DNS inside pod :1234root@redisjq:/redis# cat /etc/resolv.conf nameserver 10.96.0.10search lg.svc.cluster.local svc.cluster.local cluster.localoptions ndots:5 clearly, the pod DNS is coming from cluster, which can be check by: 12345678kubectl describe configmap kubeadm-config -n kube-system kind: ClusterConfigurationkubernetesVersion: v1.18.2networking: dnsDomain: cluster.local podSubnet: 10.4.0.0/16 serviceSubnet: 10.96.0.0/12 and that’s the reason why resolve URL failed, as clusterDNS is kind of random defined. the DNS failure gives errors as following inside pod: 1socket.gaierror: [Errno -3] Temporary failure in name resolution docker0 DNSdocker engine can configure its DNS at /etc/daemon.json, with “dns” section. when running docker in single mode, docker0 looks has host machine’s DNS, but when runnig in swarm mode, need to define additional DNS for external access. iptables in K8Sdocker and iptables you should not modify the rules Docker inserts into your iptables policies. Docker installs two custom iptables chains named DOCKER-USER and DOCKER, and it ensures that incoming packets are always checked by these two chains first a simple test can found, docker0 NIC is bridge is well to bind host network namespace, either export or response request externally. while with flannel.d NIC, pod can’t access external resources. code review: kube-proxy iptables: iptables has 5 tables and 5 chains. the 5 chaines: 12345PREROUTING: before message into route, the external request DNAT.INPUT: message to local host or current network namespace. FORWARD: message forward to other host or other network namespace.OUTPUT: message export from current hostPOSTROUTING: after route before NIC, message SNAT the 5 tables: 12345filter table, used to control the network package need to ACCEPT, or DROP, or REJECT when it comes to a chainnat(network address translation) table, used to modify the src/target address of network packagemangle table, used to modify IP header info of network packageraw tablesecurity table for k8s pods/services, mostly consider filter and nat tables. and k8s expand another 7 chains: KUBE-SERVICES、KUBE-EXTERNAL-SERVICES、KUBE-NODEPORTS、KUBE-POSTROUTING、KUBE-MARK-MASQ、KUBE-MARK-DROP、KUBE-FORWARD. virtual network flannelcheck running flanneld /etc/cni/net.d/10-flannel.conflist on host machine is the same as /etc/kube-flannel/cni-conf.json in flannel container on master node. /run/flannel/subnet.env exist in both flannel container (on master node) and in master host machine. it looks like network configure(subnet.env) is copied from container to host machine. so if there is no flannel container running on some nodes, these nodes won’t have the right network configure. at master node, HostPath points to: /run/flannel, /etc/cni/net.d, kube-flannel-cfg (ConfigMap); while at working node(due to missing /gcr.io/flannel image), /run/flannel/subnet.env is missed. previously, I thought to cp this file from master node to woker node is the solution, then this file is missed every time to restart worker node. once copied both kube-proxy and flannel images to worker node, and restart kubelet at worker node, the cluster should give Running status of all these components. including 2 copies of flannel, one running on master node, and the other running on working node. as we are using kubectl to start the cluster, the actual flanneld is /opt/bin/flanneld from the running flannel container, and it maps NIC to the host machine. another thing is, flannel is the core of the default kube-proxy, so kube-proxy image is also required on both nodes. coreDNS run two copies on master node. Flannel mechanismthe data flow: the sending message go to VNC(virtual network card) docker0 on host machine, which transfers to VNC flannel0. this process is P2P. the global etcd service maintain a iptables among nodes, which store the subnet range of each node. 2) the flanneld service on the special node package the sending message as UDP package, and delivery to target node, based on the iptables. 3) when the target node received the UDP package, it unpackage the message, and send to its flannel0, then transfer to its docker0. 1) after flanneld started，will create flannel.1 virtual network card. the purpose of flannel.1 is for across-host network, including package/unpackage UDP, and maintain iptables among the nodes. 2) each node also create cni0 virtual network card, at the first time to run flannel CNI. the purpose of cni0 is same as docker0, and it’s a bridge network, used for communication in the same node. test with redis serviceswe had define a redisjq pod, the following testes are all in this pod: 123456789kubectl exec -it redisjq -n lg /bin/bashping localhost #okping 10.255.18.3 #not ping 10.3.101.101 #notping 10.20.180.12 ifconfig &gt;&gt;eth0, 10.4.1.46&gt;&gt;lo, 127.0.0.1 the output above is initial output before we had any network setttings. basically the pod can only ping localhost, neither the host DNS, or the host IP. the vip(10.4.1.46) is not in the same network namespace as host network space. flannel.d pod on both nodes:1234567david@meng:~/k8s/lgsvl$ kubectl get pods kube-flannel-ds-amd64-85d6m -n kube-system --output=wideNAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATESkube-flannel-ds-amd64-85d6m 1/1 Running 5 15d 10.20.180.12 meng &lt;none&gt; &lt;none&gt;david@meng:~/k8s/lgsvl$ kubectl get pods kube-flannel-ds-amd64-fflsl -n kube-system --output=wideNAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATESkube-flannel-ds-amd64-fflsl 1/1 Running 154 15d 10.20.181.132 ubuntu &lt;none&gt; &lt;none&gt; flannel.d is runing on each node, should triggered by kubelet. flannel.d is used as virtual network interface, to manage across-node pod communication inside k8s. coredns pod in meng node123david@meng:~/k8s/lgsvl$ kubectl get pods coredns-66bff467f8-59g97 -n kube-system --output=wideNAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATEScoredns-66bff467f8-59g97 1/1 Running 4 14d 10.4.0.27 meng &lt;none&gt; &lt;none&gt; coredns has two replicas, both running on master node(meng), and we can see it only has virtual ip/cluster ip ( 10.4.0.x). redisjs pod in ubuntu’s node123david@meng:~/k8s/lgsvl$ kubectl get pods redisjq -n lg --output=wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATESredisjq 1/1 Running 0 20m 10.4.1.47 ubuntu &lt;none&gt; &lt;none&gt; by default, working pod is only running at woker node(ubuntu), which has clusterIP(10.4.1.47). pod1 ping pod2 in the same nodeping successfully there is no doubt in the same node, pods can ping each other. redisjq pod1(10.4.1.47) in ubuntu ping corends pod2(10.4.0.27) in meng123root@redisjq:/redis# ping 10.4.0.27 PING 10.4.0.27 (10.4.0.27) 56(84) bytes of data.64 bytes from 10.4.0.27: icmp_seq=1 ttl=62 time=0.757 ms ping successfully, which is the working of flanneld. redisjq pod1(10.4.1.7) in ubuntu ping hostIP(10.20.181.132) of ubuntu123root@redisjq:/redis# ping 10.20.181.132PING 10.20.181.132 (10.20.181.132) 56(84) bytes of data.64 bytes from 10.20.181.132: icmp_seq=1 ttl=64 time=0.127 ms ping successfuly, pod in cluster can ping its host node, sounds no problem. redisjq pod1(10.4.1.7) in ubuntu ping hostIP(10.20.180.12) of meng12root@redisjq:/redis# ping 10.20.180.12 PING 10.20.180.12 (10.20.180.12) 56(84) bytes of data. ping failed, interesting, so pod in cluster can’t ping any non-hosting node’s IP. so far, pod with vip can ping any other pod with vip in the cluster, no matter in the same node or not. pod with vip can only ping its host machine’s physical IP, but pod can’t ping other hostIP. namely, the network of pod VIP inside k8s and the bridge network from pod vip to its host is set well. but the network from pod to external IP is not well. these 4 tests give a very good understanding about flannel’s function inside k8s: pod to pod in the same node or not. but usually we need SNAT or DNAT. to make SNAT/DNAT avaiable, we need understand DNS &amp; iptables of k8s. update iptables to allow pod access public IPcni0, docker0, eno1, flannel.1 in host machine vs eth0 in podthese virtual NIC are common in k8s env. on node1 1234cni0: 10.4.1.1docker0: 172.17.0.1eno1: 10.20.181.132flannel.1: 10.4.1.0 on pod1, which is running on node1 1eth0: 10.4.1.48 pod1 -&gt; pod2 network message flow 1pod1(10.4.1.48) on node1(10.20.181.132) -&gt; cni0(10.4.1.1) -&gt; flannel.1(10.4.1.0) -&gt; kube-flannel on node1(10.20.181.132) -&gt; kube-flannel on node2(10.20.180.12) -&gt; flannel.1 on node2 -&gt; cni0 on node2 -&gt; pod2(10.4.1.46) on node2 to allow network message SNAT, namely to handle FORWARD internal clusterIP data to external services, we can add the following newe iptable rule: 1iptables -t nat -I POSTROUTING -s 10.4.1.0/24 -j MASQUERADE after add the new rule, check inside pod: 1234567891011121314root@redisjq:/redis# ping 10.20.180.12 PING 10.20.180.12 (10.20.180.12) 56(84) bytes of data.64 bytes from 10.20.180.12: icmp_seq=1 ttl=62 time=0.690 msroot@redisjq:/redis# ping 10.20.181.132PING 10.20.181.132 (10.20.181.132) 56(84) bytes of data.64 bytes from 10.20.181.132: icmp_seq=1 ttl=64 time=0.108 msroot@redisjq:/redis# ping 10.20.180.61 PING 10.20.180.61 (10.20.180.61) 56(84) bytes of data.64 bytes from 10.20.180.61: icmp_seq=1 ttl=126 time=0.366 msroot@redisjq:/redis# ping www.baidu.comping: unknown host www.baidu.comroot@redisjq:/redis# ping 61.135.169.121 #baidu IPPING 61.135.169.121 (61.135.169.121) 56(84) bytes of data.64 bytes from 61.135.169.121: icmp_seq=1 ttl=51 time=8.16 ms the DNS is not fixed, so we can’t ping www.baidu.com, but we can ping its IP. on the other hand, to hanle FORWARD external request to internal clusterIP, we can add the following new iptable rule: 1iptables -t nat -I PREROUTING -d 10.4.1.0/24 -j MASQUERADE that’s beauty of iptables. as mentioned previously, to handle pod DNS error, we need add pod/service DNS strategy inside the pod.yaml: 12spec: dnsPolicy: Default our k8s cluster has none DNS server itself, so to do SNAT/DNAT, we have to keep Default dns strategy, which make pod/service to use its host machine’s DNS, which is defined at /etc/resovl.conf. one thing to take care, some host machine has only nameserver 127.0.0.1 in resolv.conf, then we need add the real DNS server. summarywith knowledge about iptables and dns, we can make an useful K8S cluster. the left work is make useful pods. referejimmy song: config K8S DNS: kube-dns configure DNS settings in Ubuntu zhihu: k8s network k8s expose service k8s network advance docker iptables from tencent cloud]]></content>
      <tags>
        <tag>k8s</tag>
      </tags>
  </entry>
  <entry>
    <title><![CDATA[k8s setup 2]]></title>
    <url>%2F2020%2F05%2F17%2Fk8s-setup-2%2F</url>
    <content type="text"><![CDATA[backgroundthis blog try to deploy two service in k8s: redis and dashboard. the other is engineering. manual deploy via kubectl1234567kubectl create ns test-busyboxkubectl run busybox --namespace=test-busybox \ --port=8280 \ --image=busybox \ -- sh -c "echo 'Hello' &gt; /var/www/index.html &amp;&amp; httpd -f -p 8280 -h /var/www/"kubectl get pods -n test-busybox #should display `Running`, but `Pending` error1: pending pod 1kubectl describe pods/busybox -n test-busybox gives: 1Warning FailedScheduling &lt;unknown&gt; default-scheduler 0/2 nodes are available: 1 node(s) had taint &#123;node-role.kubernetes.io/master: &#125;, that the pod didn't tolerate, 1 node(s) had taint &#123;node.kubernetes.io/unreachable: &#125;, that the pod didn't tolerate. a few things to check: swapoff -a to close firewall on working node kubectl uncordon to make node schedulable kubectl uncordon error 2: failed create pod sandbox 1Warning FailedCreatePodSandBox 25s (x4 over 2m2s) kubelet, ubuntu Failed to create pod sandbox: rpc error: code = Unknown desc = failed pulling image "k8s.gcr.io/pause:3.2": Error response from daemon: Get https://k8s.gcr.io/v2/: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers) solution is to copy k8s.grc.io/pause:3.2 image to ubuntu node, and restart kubelet on working node. error 3: no network plugin CNI 1networkPlugin cni failed to set up pod "busybox_test-busybox" network: open /run/flannel/subnet.env: no such file or directory a temp solution is to cp /run/flannel/subnet.env from master node to worker node, then restart kubelet at the worker node. as further study, the cp subnet.env to worker node is not the right solution, as every time the worker node shutdown, this subnet.env file will delete, and won’t restart when reboot the worker node the next day. so the final solution here is to pull quay.io/coreos/flannel image to worker node, as well as k8s.gcr.io/kube-proxy. in later k8s version, kube-proxy is like a proxy, what’s really inside is the flannel daemon. so we need both kube-proxy and flannel at worker node, to guarantee the network working. we can see the busybox service is running well: 1234567kubectl expose pod busybox --type=NodePort --namespace=test-busyboxkubectl get pods --output=wide -n test-busyboxNAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATESbusybox 1/1 Running 0 7m57s 10.4.0.3 ubuntu &lt;none&gt; &lt;none&gt;kubectl get service busybox -n test-busyboxNAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGEbusybox NodePort 10.107.117.219 &lt;none&gt; 8280:32431/TCP 33s but the problem here is, we can’t access this service from host machine. exposing an external IP to access an app in clusterto expose service externally, define the service as eitherLoadBalancer or NodePort type. but LoaderBalancer requires external third-party: 23562 implement of load balancer, e.g. AWS.why loadBalancer service doesn’t work: if you are using a custom Kubernetes Cluster (using minikube, kubeadm or the like). In this case, there is no LoadBalancer integrated (unlike AWS or Google Cloud). With this default setup, you can only use NodePort or an Ingress Controller. 1234567891011kubectl apply -f /home/gwm/k8s/busybox.yamlkubectl get deployments hello-world #display info of Deploymentkubectl describe deployments hello-worldkubectl get replicasets #display info of ReplicaSetkubectl describe replicasetskubectl expose deployment hello-world --type=NodePort --name=my-service # create a service object that exposes the deployment kubectl get services my-service kubectl describe services my-service#cleanup when test donekubectl delete services my-servicekubectl delete deployment hello-world looks the NodePort service doesn’t work as expected: 12curl http://10.20.180.12:8280 curl: (7) Failed to connect to 10.20.180.12 port 8280: Connection refused if pods can’t be cleaned by kubectl delete pods xx, try kubectl delete pod &lt;PODNAME&gt; --grace-period=0 --force --namespace &lt;NAMESPACE&gt;. how to access k8s service outside the cluster kubectl configreconfigure a node’s kubelet in a live clusterBasic workflow overview The basic workflow for configuring a kubelet in a live cluster is as follows: Write a YAML or JSON configuration file containing the kubelet’s configuration. Wrap this file in a ConfigMap and save it to the Kubernetes control plane. Update the kubelet’s corresponding Node object to use this ConfigMap. dump configure file of each node 1NODE_NAME="the-name-of-the-node-you-are-reconfiguring"; curl -sSL "http://localhost:8001/api/v1/nodes/$&#123;NODE_NAME&#125;/proxy/configz" | jq '.kubeletconfig|.kind="KubeletConfiguration"|.apiVersion="kubelet.config.k8s.io/v1beta1"' &gt; kubelet_configz_$&#123;NODE_NAME&#125; our cluster have ubuntu and meng(as leader) two nodes. with these two config files, we found two existing issues: 1) network config on two nodes doesn’ match each other 123456&lt; "clusterDomain": "xyz.abc",---&gt; "clusterDomain": "cluster.local",&lt; "10.3.0.10"---&gt; "10.96.0.10" after generrating the NODE config files above, we can edit these files, and then push the edited config file to the control plane: 12NODE_NAME=meng; kubectl -n kube-system create configmap meng-node-config --from-file=kubelet=kubelet_configz_$&#123;NODE_NAME&#125; --append-hash -o yamlNODE_NAME=ubuntu; kubectl -n kube-system create configmap ubuntu-node-config --from-file=kubelet=kubelet_configz_$&#123;NODE_NAME&#125; --append-hash -o yaml after this setting up, we can check the new generated configmaps: 12kubectl get configmaps -n kube-systemkubectl edit configmap meng-node-config-t442m526c5 -n kube-system tips: configMaps is also an Object in k8s, just like namespace, pods, svc. but which is only in /tmp, need manually dump. namely: 12meng-node-config-t442m526c5 1 35mubuntu-node-config-ghkg27446c 1 18s set node to use new configMap, by kubectl edit node ${NODE_NAME}, and add the following YAML under spec: 12345configSource: configMap: name: CONFIG_MAP_NAME # replace CONFIG_MAP_NAME with the name of the ConfigMap namespace: kube-system kubeletConfigKey: kubelet observe the node begin with the new configuration 1kubectl get node $&#123;NODE_NAME&#125; -o yaml 2) kubectl command doesn’t work at worker node basically, worker node always report error: Missing or incomplete configuration info. Please point to an existing, complete config file when running kubectl command. which needs to copy /etc/kubernetes/admin.conf from master to worker, then append cat &quot;export KUBECONFIG=/etc/kubernetes/admin.conf&quot; &gt;&gt; /etc/profile at worker node. organizing cluster accesss using kubecnfig files docker0 iptables transferwhen starting docker engine, docker0 VNC is created, and this vnc add its routing rules to the host’s iptables. From docker 1.13.1, the routing rules of docker0 vnc is only transfer to localhost of the host machine, namely docker0 to any other non-localhost is forbidden, which leads to the service can only be access on the host machine, where this pod/container is running. in multi-nodes k8s, we need enable iptable FORWARD. append the following line to ExecStart line in file /lib/systemd/system/docker.service: 1ExecStartPost=/sbin/iptables -I FORWARD -s 0.0.0.0/0 -j ACCEPT then restart docker engine: 12systemctl daemon-reloadsystemctl restart docker.service after enable docker0 iptable rules, the following test service can be accessed on both nodes. deploy redis servicecreate a k8s-redis image1234567891011# use existing docker image as a baseFROM ubuntu:16.04# Download and install dependencyRUN apt-get update &amp;&amp; apt-get install -y --no-install-recommends redis-server# EXPOSE the port to the Host OSEXPOSE 6379# Tell the image what command it has to execute as it starts as a containerCMD ["redis-server"] build the image and push to both nodes. deploy a redis-deployment create redis-deployment.yaml: 123456789101112131415161718192021apiVersion: apps/v1kind: Deploymentmetadata: labels: app.kubernetes.io/name: load-balancer-example name: kredis-deploymentspec: replicas: 1 selector: matchLabels: app.kubernetes.io/name: load-balancer-example template: metadata: labels: app.kubernetes.io/name: load-balancer-example spec: containers: - image: 10.20.181.119:5000/k8s_redis name: kredis ports: - containerPort: 6379 expose deployment as service 12345678910111213kubectl create ns test-busyboxkubectl apply -f redis-deployment.yamlkubectl get deployments redis-deployment #display info of Deploymentkubectl describe deployments redis-deploymentkubectl get replicasets #display info of ReplicaSetkubectl describe replicasetskubectl expose deployment redis-deployment --type=NodePort --name=my-redis # create a service object that exposes the deployment kubectl get services my-redis kubectl describe services my-redis kubectl get pods --output=wide#clean up later (afer step 3)kubectl delete services my-rediskubectl delete deployment redis-deployment access as pod123456789gwm@meng:~/k8s/alpine$ kubectl get pods --output=wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATESkredis-deployment-7567b7f4b7-wmqgd 1/1 Running 0 16h 10.4.1.18 ubuntu &lt;none&gt; &lt;none&gt;gwm@meng:~/k8s/alpine$ redis-cli -p 6379Could not connect to Redis at 127.0.0.1:6379: Connection refusednot connected&gt; gwm@ubuntu:~$ redis-cli -p 6379Could not connect to Redis at 127.0.0.1:6379: Connection refusednot connected&gt; as we can see here, as redis-server as pod, won’t expose any port. and pod-IP(10.4.1.18) is only accessible inside cluster access as service1234567891011kubectl get services --output=wide NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTORkredis-deploy NodePort 10.104.43.224 &lt;none&gt; 6379:31962/TCP 23h app.kubernetes.io/name=load-balancer-examplekubernetes ClusterIP 10.96.0.1 &lt;none&gt; 443/TCP 8d &lt;none&gt;root@ubuntu:~# docker container inspect 60bfd6c5ccac | grep 31962 root@ubuntu:~# redis-cli -p 31962 127.0.0.1:31962&gt; root@ubuntu:~# redis-cli -p 31962 -h 10.20.181.132 10.20.181.132:31962&gt; gwm@meng:~$ redis-cli -h 10.20.181.132 -p 31962 10.20.181.132:31962&gt; so basically, we can access redis as service with the exposed port 31962, and the host node’s IP(10.20.181.132), (rather than the serivce cluster IP(10.104.43.224). tips, only check service, won’t tell on which node, the pod is running. so need check the pod, and get its node’s IP. with docker StartExec with iptable FORWARD, redis-cli on on both ubuntu node and meng node can access the service. in summary: if we deploy service as NodePort, we suppose to access the service with its host node’s IP and the exposed port, from external/outside of k8s. endpointsk8s endpoints. what’s the difference from endpoints to externalIP ? 123kubectl get endpoints NAME ENDPOINTS AGEkubernetes 10.20.180.12:6443 8d it gives us the kubernetes endpoints, which is avaiable on both meng and ubuntu nodes. 1234gwm@meng:~$ curl http://10.20.180.12:6443 Client sent an HTTP request to an HTTPS server.gwm@ubuntu:~$ curl http://10.20.180.12:6443 Client sent an HTTP request to an HTTPS server. not every service has ENDPOINTS, which gives the way to access outside of the cluster. but NodePort type service can bind to the running pod’s host IP with the exported port. whenever expose k8s service to either internally or externally, it goes through kube-proxy. when kube-proxy do network transfer, it has two ways: Userspace or Iptables. clusterIP, is basically expose internnaly, with the service’s cluster IP; while nodePort type, is basically bind the service’s port to each node, so we can access the service from each node with the node’s IP and this fixed port. apiservercore of k8s: API Server, is the RESTful API for resource POST/GET/DELETE/UPDATE. we can access through: 1234curl apiserver_ip:apiserver_port/apicurl apiserver_ip:apiserver_port/api/v1/podscurl apiserver_ip:apiserver_port/api/v1/servicesCURL apiserver_ip:apiserver_port/api/v1/proxy/nodes/&#123;name&#125;/pods/ check apiServer IP 12kubectl get pods -n kube-system --output=widekube-apiserver-meng 1/1 Running 2 8d 10.20.180.12 meng &lt;none&gt; &lt;none&gt; if check the LISTEN ports on both worker and master nodes, there are many k8s related ports, some are accessible, while some are not. k8s dashboardthe following is from dashboard doc in cn download src 123docker search kubernetes-dashboard-amd64docker pull k8scn/kubernetes-dashboard-amd64docker tag k8scn/kubernetes-dashboard-amd64:latest k8s.gcr.io/kubernetes-dashboard-amd64:latest clear old dashboard resources if there are old running dashboard, can clear first. 12345kubectl get clusterroles kubernetes-dashboard --output=wide kubectl get clusterrolebindings kubernetes-dashboard --output=wide kubectl delete clusterroles kubernetes-dashboard kubectl delete clusterrolebindings kubernetes-dashboard kubectl delete ns kubernetes-dashboard start a fresh dashboard 12kubectl apply -f https://kuboard.cn/install-script/k8s-dashboard/v2.0.0-beta5.yaml kubectl apply -f https://kuboard.cn/install-script/k8s-dashboard/auth.yaml or src from github/dashboard/recommended.yaml, and run: 12kubectl create -f admin-user.yamlkubectl create -f recommended.yaml admin-user.yaml is defined wih admin authorization. if not define or applied, when login to dashboard web UI, it gives some errors like: 1namespaces is forbidden: User "system:serviceaccount:kubernetes-dashboard:kubernetes-dashboard" cannot list resource "namespaces" in API group "" at the cluster scope so there are two tips during creating dashboard. auth/admin-user.yaml is required add NodePort type service to expose dashboard. if not, can’ access dashboard on host machine. refer from deploy dashboard &amp;&amp; metrics-server create external-http.yaml to expose NodePort service create admin-user.yaml for admin manage get the ServiceAccount token 1kubectl -n kube-system describe secret $(kubectl -n kube-system get secret | grep admin-user | awk '&#123;print $1&#125;') go to https://nodeIP.6443, tips, dashboard service is using https login dashboard there are two ways to auth to login dashboard: – kubeconfig, the configure to access the cluster – token, every service account has a secret with valid Bearer Token, that can used to login to Dashboard. system checks 123kubectl get secrets -n kubernetes-dashboardkubectl get serviceaccount -n kubernetes-dashboardkubectl describe serviceaccount kubernetes-dashboard -n kubernetes-dashboard metrics-servermetrics-server is a replace of Heapster. 1kubectl edit deploy -n kubernetes-dashboard dashboard-metrics-scraper rolesthe right way to create a role: create a ServiceAccount bind a role for the ServiceAccount(cluster-admin role is needed) make a ClusterRoleBinding for ServiceAccount list all container images in all ns1234kubectl get pods --all-namespaces -o jsonpath="&#123;..image&#125;" |\tr -s '[[:space:]]' '\n' |\sort |\uniq -c referekubectl cheatsheet deployments from k8s doc deploy tiny web server to k8s k8s production best practices cni readme configure network plugins: k8s与flannel网络原理 清晰脱俗的直解K8S网络 k8s: iptables and docker0 linux docker and iptables controlling access to k8s APIserver understand RBAC auth]]></content>
      <tags>
        <tag>k8s</tag>
      </tags>
  </entry>
  <entry>
    <title><![CDATA[k8s setup 1]]></title>
    <url>%2F2020%2F05%2F17%2Fk8s-setup-1%2F</url>
    <content type="text"><![CDATA[backgroundtransfer from docker swarm to K8S finally. this is a lot engineering work, once have some knowledge about docker/swarm. there are two things: a more general abstract object. e.g. pod, service/svc, deployment, secret, namespace/ns, role e.t.c. more DevOps engineering previously, I palyed with k8s in theory. this time is more about build a k8s cluster in reality. install kubeadm kubeadm, used to initial cluster kubectl, the CLI tool for k8s kubelet, run on all nodes in the cluster all three commands are required on all nodes. check install kube officially swapoff 1sudo swapoff -a create a file /etc/apt/sources.list.d/kubernetes.list with the following content: 1deb https://mirrors.aliyun.com/kubernetes/apt/ kubernetes-xenial main add gpg key 12gpg --keyserver keyserver.ubuntu.com --recv-keys BA07F4FB gpg --export --armor BA07F4FB | sudo apt-key add - apt install 12sudo apt-get update sudo apt-get install kubelet kubectl kubeadm tips, apt-get install will install v1.18.2. restart kubelet 12systemctl daemon-reloadsystemctl restart kubelet if need degrade to v17.3, do the following: 123sudo apt-get install -qy --allow-downgrades kubelet=1.17.3-00sudo apt-get install -qy --allow-downgrades kubeadm=1.17.3-00sudo apt-get install -qy --allow-downgrades kubectl=1.17.3-00 kubeadm setupwe setup k8s with kubeadm tool, which requires a list of images: check the required images to start kubeadm 1kubeadm config images list which returns: 1234567k8s.gcr.io/kube-apiserver:v1.18.2k8s.gcr.io/kube-controller-manager:v1.18.2k8s.gcr.io/kube-scheduler:v1.18.2k8s.gcr.io/kube-proxy:v1.18.2k8s.gcr.io/pause:3.2k8s.gcr.io/etcd:3.4.3-0k8s.gcr.io/coredns:1.6.7 the image source above is not aviable, which can be solved by: 1kubeadm config images pull --image-repository=registry.cn-hangzhou.aliyuncs.com/google_containers/ if the command above doesn’t work well, try to docker pull directly and tag the name back to k8s.gcr.io: 12docker pull registry.cn-hangzhou.aliyuncs.com/google_containers/$imageNamedocker tag registry.cn-hangzhou.aliyuncs.com/google_containers/$imageName k8s.gcr.io/$imageName start a k8s clusterafter the preparation above, finally start a k8s cluster as: 1kubeadm init --pod-network-cidr=10.4.0.0/16 --cluster_dns=10.3.0.10 futher, check kubeadm init options: 12345--pod-network-cidr # pod network IP range--service-cidr # default 10.96.0.0/12--service-dns-domain #cluster.local cluster_dns option is used as the cluster DNS/namespace, which will be used in the configureMap for coreDNS forwarding. if start successfully, then run the following as a regular user to config safety-verficiation: 123mkdir -p $HOME/.kubesudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/configsudo chown $(id -u):$(id -g) $HOME/.kube/config check 123sudo kubectl get nodes #both nodes are READYsudo kubectl get pods -n kube-system #check systemsudo kubectl describe pod coredns-xxxx -n kube-system add pod networkpod network, is the network through which the cluster nodes can communicate with each other. 1sudo kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml worker node to join the new k8s cluster : 1234kubeadm resetsudo swapoff -a kubeadm join 10.20.180.12:6443 --token o0pcpc.v3v8bafmbu6e4bcs \ --discovery-token-ca-cert-hash sha256:2a15d392821f8c51416e49e6ccd5393df6f93d738b24b2132e9a9a19276f4f54 then cp flannel-cni.conflist into worker node. /etc/cni/net.d/10-flannel.conflist to the same path in worker node. if check sudo kubectl get pods -n kube-system : there may come an error: here found: coredns CrashLoopBackOff or Error this is due to DNS/nameserver resolving issue in Ubuntu, wherecoreDNS serviceforwarding the k8s cluster service to the host/etc/resolv.conf, which only has127.0.0.1`. the cause for CoreDNS to have CrashLoopBackOff is when a CoreDNS Pod deployed in Kubernetes detects a loop. A number of workarounds are available to avoid Kubernetes trying to restart the CoreDNS Pod every time CoreDNS detects the loop and exits. check the coreDNS configMap by : 1kubectl edit cm coredns -n kube-system we see something like: 1234567 prometheus :9153# forward . /etc/resolv.conf forward . 10.3.0.10 cache 30 loop reload loadbalance so modify forward line to forward . 10.3.0.10. or to delete loop service there, which is not good idea. a very good explain test clustertest sample clear clusterclear test 1sudo systemctl stop kubelet kube-proxy flanneld docker understand CNI (container network interface)the following network plugin can be found from k8s cluster networking backgroud container network is used to connect (container itself) to other containers, host machine or external network. container in runtime has a few network mode: 12345nonehostbridge CNI brings a general network framework, used to manage network configure and network resources. coreDNSfirst, run coreDNS as a service in the cluster. then, update kubelet parameters to include IP of coreDNS and the cluster domain. if there is no existing running Kube-DNS, or need a different CLusterIP for CoreDNS, then need update kubelet configuration to set cluster_dns and cluster_domain appropriately, which can be modified at: /etc/systemd/system/kubelet.service/10-kubeadm.conf with additional options appending at kubelet line : 1--cluster_dns=10.3.0.10 --cluster_domain=cluster.local restart kubelet service 123systemctl status kubelet systemctl daemon-reload systemctl restart docker flannel mis-usagein the settings above, I manually copy flannel-cni.conflist and /run/flannel/subnet.env to worker node every time, whenever reboot the worker node. if else, the cluster bother the worker node is NotReady. as we deploy the cluster with kubectl, which is kind of a swarm service deploy tool. so the right way to use flannel should have all k8s.gcr.io/kube-proxy, quay.io/coreos/flannel images at worker node as well. for version1.17+, flannel replace the default kube-proxy, but it still requires to have kube-proxy running in each node(kubelet). after restart kubelet, checking pods -n kube-system, it shows kube-proxy and flannel on each node has a Running status. coreDNS services has run the same size of copies as the number of nodes, but we can find that all of them are running on leader node. understand pod in k8saccessing k8s pods from outside of cluster hostNetwork: true this option applies to k8s pods, which work as --network=host in docker env. options can used for create pod: name command args env resources ports stdin tty create pod/deployment using yaml k8s task1: define a command and args for a container templating YAML in k8s with real code but hostNetwork is only yaml supported hostPort the container port is exposed to the external network at :. 12345spec: containers: ports: - containerPort: 8086 hostPort: 8086 hostPort allows to expose a single container port on the hostIP. but the hostIP is dynamic when container restarted nodePort by default, services are accessible at ClusterIP, which is an internal IP address reachable from inside the cluster. to make the service accessible from outside of the cluster, can create a NodePort type service. once this service is created, the kube-proxy, which runs on each node of the cluster, and listens on all network interfaces is instructed to accept connections on port 30000, (from any IP ?). the incoming traffc is forwardedby the kube-proxy to the selected pods in a round-robin fashion. this service represents a static endpoint through which the selected pods can be reached. Ingress The Ingress controller is deployed as a Docker container on top of Kubernetes. Its Docker image contains a load balancer like nginx or HAProxy and a controller daemon. view pods and nodes check running pods on which node resolv.conf in k8s podrun as interactive into a k8s pod, then check its resolv.conf: 123nameserver 10.96.0.10search default.svc.cluster.local svc.cluster.local cluster.localoptions ndots:5 10.96.0.10 is the K8S DNS server IP, which is actually the service IP of kube-dns service. interesting, we can ping neither 10.96.0.10, nor 10.4.0.10, which is not existing service in the cluster, nor 10.3.0.10, which is the coreDNS forwarding IP. remember during setup the k8s cluster, we had define the coreDNS forwarding to 10.3.0.10, is this why I can’t run curl http://&lt;ip&gt;:&lt;port&gt; works ? check coreDNS service: 12345kubectl describe pods coredns-66bff467f8-59g97 -n kube-system Name: coredns-66bff467f8-59g97Node: meng/10.20.180.12Labels: k8s-app=kube-dnsIP: 10.4.0.6 when start coreDNS, is actually used to relace kube-dns. understand service in k8sdoc Each Pod gets its own IP address, however in a Deployment, the set of Pods running in one moment in time could be different from the set of Pods running that application a moment later. A Service in Kubernetes is a REST object, similar to a Pod. you can POST a Service definition to the API server to create a new instance. Kubernetes assigns this Service an IP address, sometimes called the clusterIP, Virtual IP and service proxiesEvery node in a Kubernetes cluster runs a kube-proxy, kube-proxy is responsible for implementing a form of virtual IP for Services, whose is type is any but not ExternalName. choosing own IP for serviceYou can specify your own cluster IP address as part of a Service creation request. The IP address that you choose must be a valid IPv4 or IPv6 address from within the service-cluster-ip-range CIDR range that is configured for the API server discovering services ENV variables DNS headless servicesby explicitly specifying “None” for the cluster IP (.spec.clusterIP). publishing services(ServiceTypes)expose a service to an external IP address, outside of the cluster. service has four type: ClusterIP (default): expose the service on a cluster-internal IP, which is only reachable inside the cluster NodePort: expose the service on each node’s IP at a static port(NodePort), to access : : ExternalName: map the services to an externalName LoadBalancer: expose the service externally using third-party load balancer(googl cloud, AWS, kubeadm has none LB) NodePort and LoadBalancer can expose service to public, or outside of the cluster. external IPsif there are external IPs that route to one or more cluster nodes, services can be exposed on those externalIPs. yaml deployment of service/podthe previous sample busybox, is running as pod, through kubectl run busybox ? so there is no external deployment obj using yaml file to create service and expose to publicsome basic knowledge: 1) pod is like container in docker, which assigned a dynamic IP in runtime, but this pod IP is only visible inside cluster 2) service is an abstract concept of pod, which has an unique exposed IP, and the running pods belong to this service are managed hidden. pod or deployment or service both pod and deployment are full-fledged objects in k8s API. deployment manages creating Pods by means of ReplicaSets, namely create pods with spec taken from the template. since it’s rather unlikely to create pods directly in a production env. in production, you will almost never use an object with the type pod. but a deployment object, which needs to keep the replicas(pod) alive. what’s use in practice are: 1) Deployment object, where to specify app containers with some specifications 2) service object you need service object since pods from deployment object can be killed, scaled up and down, their IP address is not persistent. kubectrl commands123456789101112131415161718192021222324252627kubectl get [pods|nodes|namespaces|services|pv] --ns your_namespacekubectl describe [pods|nodes|namespaces]kubectl label pods your_pod new-label=your_labelkubectl apply -f [.yaml|.json] #creates and updates resources in the clusterkubectl create deployment service_name --imae=service_image #start a single instance of servicekubectl rollout history deployment/your_service #check history of deployment kubectl expose rc your_sevice --port=80 --target-port=8000kubectl autoscale deployment your_service --min=MIN_Num --max=MAX_Numkubectl edit your_service #edit any API resource in your preferred editor kubectl scale --replicas=3 your_service kubectl delete [pod|service]kubectl logs your_pod # dump pod logs kubectl run -i --tty busybox --image=busybox -- sh # run pod as interactive shell kubectl exec -ti your_pod -- ls | nslookup kubernetes.default #run command in existing pod (1 container case) kubectl is pretty much like docker command and more. refereblog: setup k8s on 3 ubuntu nodes cni readme flannel for k8s from silenceper coreDNS for k8s service discovery]]></content>
      <tags>
        <tag>k8s</tag>
      </tags>
  </entry>
  <entry>
    <title><![CDATA[services deploy in docker swarm]]></title>
    <url>%2F2020%2F04%2F28%2Fservices-deploy-in-docker-swarm%2F</url>
    <content type="text"><![CDATA[backgroudour application so far includes the following three services: simulator pythonAPI redisJQ docker swarm network has the following three kind of networks, of course bridge to host network. overlay network, services in the same overlay network, can communicate to each other routing network, the service requested can be hosted in any of the running nodes, further as load balancer. host network usually, multi-container apps can be deployed with docker-compose.yml, check docker compse for more details. DNS service discoverythe following is an example from (overlay networking and service discovery: my test env includes 2 nodes, with host IP as following. when running docker services, it will generate a responding virtual IP, while which is dynamic assgined. hostname virtualIP hostIP node1 10.0.0.4 xx.20.181.132 node2 10.0.0.2 xx.20.180.212 a common issue when try first to use overlay network in swarm, e.g. ping the other service doesn’t work, check /etc/resolv.conf file: 12345678910111213141516# cat /etc/resolv.conf nameserver 8.8.8.8nameserver 8.8.4.4``` the default `dns=8.8.8.8` can't ping either hostIP or any docker0 IP. the reason can find [#moby/#23910](https://github.com/moby/moby/issues/23910): When spinning up a container, Docker will by default check for a DNS server defined in /etc/resolv.conf in the host OS, and if it doesn't find one, or finds only 127.0.0.1, will opt to use Google's public DNS server 8.8.8.8. one solution mentioned:* cat /etc/docker/daemon.json ```xml&#123; "dns": ["172.17.0.1", "your.dns.server.ip", "8.8.8.8"]&#125; add a file /etc/NetworkManager/dnsmasq.d/docker-bridge.conf 1listen-address=172.17.0.1 so basically, the default DNS setting only listens to DNS requests from 127.0.0.1 (ie, your computer). by adding listen-address=172.17.0.1, it tells it to listen to the docker bridge also. very importantly, Docker DNS server is used only for the user created networks, so need create a new docker network. if use the default ingress overlay network, the dns setup above still doesn’t work. another solution is using host network, mentioned using host DNS in docker container with Ubuntu test virtualIP network create a new docker network 1docker network create -d overlay l3 why here need a new network ? due to Docker DNS server(172.17.0.1) is used only for the user created networks start the service with the created network: 1234567docker service create --name lg --replicas 2 --network l3 20.20.180.212:5000/lg``` * check vip on both nodes```sh docker network inspect l3 check vip by the line IPv4Address: 12node1 vip : `10.0.1.5/24`node2 vip: `10.0.1.6/24` go to the running container 1234docker exec -it 788e667ea9cb /bin/bash apt-get update &amp;&amp; apt-get install iputils-pingping 10.0.1.5ping 10.0.1.6 now ping service-name directly 12ping lg PING lg (10.0.1.2) 56(84) bytes of data. inspect service 123456docker service inspect lg "VirtualIPs": [ &#123; "Addr": "10.0.1.2/24" &#125; ] ping host IP from contianer vip as far as we add both host dns and docker0 dns to the dns option in /etc/docker/daemon.json, the container vip can ping host IP. assign ENV variable from script get services vip get vip list 12vip=`sudo docker service inspect --format '&#123;&#123;.Endpoint.VirtualIPs&#125;&#125;' lgsvl | awk '&#123;print substr($2, 1, length($2)-5)&#125;'`echo $vip create docker service with runtime env 123456789101112ping -c 1 lg | awk 'NR==1 &#123;print $2&#125;' ``` ## multi-services test#### run all services in single docker mode```shdocker run -it -p 6379:6379 --mount source=jq-vol,target=/job_queue redisjq /bin/bash docker run xx.xx.xx.xxx:5000/lgdocker run -it --mount source=jq-vol,target=/pythonAPI/job_queue xx.xx.xx.xxx:5000/redispythonapi /bin/bash check docker-IP of lg : 12docker sh docker container inspect &lt;lg&gt; #get its IP-address update SIMULATOR_HOST for redispythonapi 123docker exec -it &lt;redispythonapi&gt; /bin/bashexport SIMULATOR_HOST=lg_ip_address #from the step above./redis_worker.sh #where all python scenarios are running in queue here we can check the lg container’s IP is 172.17.0.3 and redispythonapi’s IP is 172.17.0.4, then update start_redis_worker.sh with SIMULATOR_HOST=172.17.0.3 get the container IP 12docker container ls | grep -w xx.xx.xx.xxx:5000/lg | awk '&#123;print $1&#125;' docker inspect --format='&#123;&#123;range .NetworkSettings.Networks&#125;&#125;&#123;&#123;.IPAddress&#125;&#125;&#123;&#123;end&#125;&#125;' $(docker container ls | grep -w xx.xx.xx.xxx:5000/lg | awk '&#123;print $1&#125;' ) assign a special IP to service in swarmdocker network create support subnet, which only ip-addressing function, namely we can use custom-defined virtual IP for our services. a sample: 1234567891011121314151617docker network create -d overlay \ --subnet=192.168.10.0/25 \ --subnet=192.168.20.0/25 \ --gateway=192.168.10.100 \ --gateway=192.168.20.100 \ --aux-address="my-router=192.168.10.5" --aux-address="my-switch=192.168.10.6" \ --aux-address="my-printer=192.168.20.5" --aux-address="my-nas=192.168.20.6" \ my-multihost-network``` we can run our application as:```shdocker network create --driver=overlay --subnet=192.168.10.0/28 lgsvl-netdocker service create --name lgsvl --replicas 2 --network lgsvl-net --host "host:192.168.10.2" xx.xx.xx.xxx:5000/lgsvldocker service create --name redis --replicas 1 --network lgsvl-net -p 6379:6379 --mount source=jq-vol,target=/job_queue --constraint 'node.hostname==ubuntu' xx.xx.xx.xxx:5000/redisjqdocker service create --name pythonapi --replicas 1 --network lgsvl-net --mount source=jq-vol,target=/pythonAPI/job_queue xx.xx.xx.xxx:5000/redispythonapi understand subnet mask. IP address include master IP and subnet mask, we choose 28 here, basically generate about 2^(32-28)-2= 14 avaiable IP address in the subnet. but in a swarm env, subnet IPs are consuming more as the nodes or replicas of service increase. taking an example, with 2-nodes and 2-replicas of service, 5 subnet IPs are occupied, rather than 2 run docker network inspect lgsvl-net on both nodes: on node1 gives: 12lg.1 IPV4Address: 192.168.10.11/28lgssvl-net-endpoint: 192.168.10.6/28 on node2 gives: 12345678lg.2 IPV4Address: 192.168.10.4/28lgssvl-net-endpoint: 192.168.10.3/28``` * `docker service inspect lg` gives:```xmlVirualIPs: 192.168.10.2/28 clearly 5 IP address are occupied. and the IP for each internal service is random picked, there is no gurantee service will always get the first avaiable IP. docker serivce with –iponly docker run –ip works, there is no similar --ip option in docker service create. but a lot case require this feature: how to publish a service port to a specific IP address, when publishing a port using --publish, the port is published to 0.0.0.0 instead of a specific interface’s assigned IP. and there is no way to assign an fixed IP to a service in swarm. a few disscussion in moby/#26696, add more options to `service create, a possible solution, Static/Reserved IP addresses for swarm services mostly depend on the issues like “ip address is not known in advance, since docker service launched in swarm mode will end up on multiple docker servers”. there should not be applicable to docker swarm setup, since if one decides to go with docker swarm service, has to accept that service will run on multiple hosts with different ip addresses. I.e. trying to attach service / service instance to specific IP address somewhat contradicting with docker swarm service concept. docker service create does have options --host host:ip-address and --hostname and similar in docker service update support host-add and host-rm. 1234567891011121314$ docker service create --name redis --host "redishost:192.168.10.2" --hostname myredis redis:3.0.6``` then exec into the running container, we can check out `192.168.10.2 redishost` is one line in `/etc/hosts` and `myredis` is in `/etc/hostname`but remember, the DNS for this hostIP(192.168.10.2) should be first configured in the docker engine DNS list. if not, even the hostIP is in the arrange of the subnet, it is unreachable from the containers.[another explain](https://www.freecodecamp.org/news/docker-nginx-letsencrypt-easy-secure-reverse-proxy-40165ba3aee2/): by default docker containers are put on their own network. This means that you won’t be able to access your container by it’s hostname, if you’re sitting on your laptop on your host network. It is only the containers that are able to access each other through their hostname.#### dnsrr vs vip ```sh--endpoint-mode dnsrr dnsrr mode, namely DNS round Robin mode, when query Docker’s internal DNS server to get the IP address of the service, it will return IP address of every node running the service. vip mode, return the IP address of only one of the running cntainers. When you submit a DNS query for a service name to the Swarm DNS service, it will return one, or all, the IP addresses of the related containers, depending on the endpoint-mode. dnsrr vs vip: Swarm defaults to use a virtual ip (endpoint-mode vip). So each service gets its own IP address, and the swarm load balancer assigns the request as it sees fit; to prevent a service from having an IP address, you can run docker service update your_app --endpoint-mode dnsrr, which will allow an internal load balancer to run a DNS query against a service name, to discover each task/container’s IP for a given service in our case, we want to assign a speical IP to the service in swarm. why? because our app has websocket server/client communicataion, which is IP address based. we can’ assign service name for WS server/client. check another issue: dockerlize a websocket server global mode to run swarmwhen deploying service with global mode, namely each node will only run one replicas of the service. the benefit of global mode is we can always find the node IP, no matter the IP address is in host network or user-defined overlay network/subnetwork. get service’s IP in global modeget listened port 1234root@c7279faebefa:/lgsvl# netstat -tulpn | grep LISTENtcp 0 0 127.0.0.11:33263 0.0.0.0:* LISTEN - tcp 0 0 0.0.0.0:8080 0.0.0.0:* LISTEN 7/lgsvl_Core.x86_64tcp 0 0 0.0.0.0:8181 0.0.0.0:* LISTEN 7/lgsvl_Core.x86_64 both 8080 and 8181 is listening after lgsvl service started. on the lgsvl side, we can modify it to listen on all address with 8181 port. then the following python script to find node’s IP: 1234567891011import socket def get_host_ip(): try: s = socket.socket(socket.AF_INET, socket.SOCK_DGRAM) s.connect(('8.8.8.8', 80)) ip = s.getsockname()[0] finally: s.close() return ip in this way, no need to pre-define the SIMULATOR_HOST env variable at first. the pythonAPI only need to find out its own IP and detect if 8181 is listening on in runtime. container vs servicedifference between service and container: docker run is used to create a standalone container docker service is the one run in a distributed env. when create a service, you specify which container image to use and which commands to execue inside the running containers. There is only one command(no matter the format is CMD ENTRYPOINT or command in docker-compose) that docker will run to start your container, and when that command exits, the container exits. in swarm service mode, with default restart option(any), the container run and exit and restart again with a different containeID. check dockerfile, docker-compose and swarm mode lifecycle for details. docker container restart policy:docker official doc: start containers automatically no, simply doesn’t restart under any circumstance on-failure, to restart if the exit code has error. user can specify a maximum number of times Docker will automatically restart the container; the container will not restart when app exit with a successful exit code. unless-stopped, only stop when Docker is stopped. so most time, this policy work exactly like always, one exception, when a container is stopped and the server is reboot or the DOcker serivce is restarted, the container won’t restart itself. if the container was running before the reboot, the container would be restarted once the system restarted. always, tells Docker to restart the container under any circumstance. and the service will restart even with reboot. any other policy can’t restart when system reboot. similar restart policy can be found in : docker-compose restart policy docker service create restat-condition keep redisJQ alive in python scriptby default setup, redis server is keep restarting and running, which make the pythonapi service always report: redis.exceptions.ConnectionError: Error 111 connecting to xx.xxx.xxx:6379. Connection refused. so we can keep redisJQ alive in python script level by simply a while loop. for test purpose, we also make pythonAPI restart policy as none, so the service won’t automatically run even with empty jobQueue. the final test script can run in the following: 123docker service create --name lgsvl --network lgsvl-net --mode global xx.xx.xx.xxx:5000/lgsvldocker service create --name redis -p 6379:6379 --network lgsvl-net --mount source=jq-vol,target=/job_queue --constraint 'node.hostname==ubuntu' xx.xx.xx.xxx:5000/redisjq docker service create --name pythonapi --network lgsvl-net --mode global --mount source=jq-vol,target=/pythonAPI/job_queue --restart-condition none xx.xx.xx.xxx:5000/pythonapi use python variable in os.systemsample 1os.system("ls -lt %s"%your_py_variable) proxy in docker swarmHAProxy Routing external traffic into the cluster, load balancing across replicas, and DNS service discovery are a few capabilities that require finesse. but proxy can’t either assign a special IP to a special service, neither can expose the service with a fixed IP, so in our case, no helpful.]]></content>
      <tags>
        <tag>swarm</tag>
      </tags>
  </entry>
  <entry>
    <title><![CDATA[redis task queue (2)]]></title>
    <url>%2F2020%2F04%2F28%2Fredis-task-queue-2%2F</url>
    <content type="text"><![CDATA[backgroundcurrently, we add job_queue list inside Dockerfile by COPY the job_queue folder from local host to the docker image, which is not dynamically well, and can’t support additionaly scenarios. to design a redisJQ service that can used in swarm/k8s env, need consider DNS to make the servie available and shared volume to share the data in job queue to other services. ceph rbd driver for dockerceph can store files in three ways: rbd, block storage, which usually used with virtualization kvm object storage, through radosgw api, or access by boto3 APIs. cephfs, mount ceph as file system the first idea is from local host mount to remote volume(e.g. ceph storage) mount. there are a few popular rbd-drive plugins: Yp engineering AcalephStorage Volplugin rexray.io check ceph rbd driver to understand more details. to support rbd-driver plugin in docker, the ceph server also need support block device driver, which sometime is not avaiable, as most small ceph team would support one or another type, either objec storage or block storage. and that’s our situation. so we can’t go with rbd-driver plugin. another way is to use docker volume cephfs, similar reason our ceph team doesn’t support cephfs. ceph object storage accessas the ceph team can support boto3 API to access ceph, which gives us the one and only way to access scenarios: boto3. basically the redis_JQ first download all scenaio files from remote ceph through boto3 APIs, then scan the downloaded files into JQ, then feed into the python executors in front. s3 client aws cli 1234curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o "awscliv2.zip"unzip awscliv2.zipsudo ./aws/install/usr/local/bin/aws --version s3cmd access files in folders in s3 bucket12345678910def download_files(self, bucket_name, folder_name): files_with_prefix = self.s3_client.list_objects_v2(Bucket=bucket_name, Prefix=folder_name) scenario_basename = "/pythonAPI/job_queue/scenario" i = 0 for file_ in files_with_prefix["Contents"]: scenario_name = scenario_basename + "%d"%i + ".py" print(scenario_name) self.download_file(bucket_name, file_['Key'], scenario_name, False) time.sleep(0.01) i += 1 manage python modulesduring the project, we really need take care python packages installed by apt-get, pip and conda, if not there will conflicts among different version of modules: 1234import websocketsTrackbac: File "/usr/lib/python3/dist-packages/websockets/compatibility.py", line 8 asyncio_ensure_future = asyncio.async # Python &lt; 3.5 so it’s better to use conda or python virtual-env to separate the different running envs. and install packages by conda install is better choice, than the global apt-get install: conda install ws conda install pandas conda install asammdf conda install botocore conda install sqlalchemy conda install websocket-client conda install redis conda install boto3 basic of python import module, any *.py file, where its name is the file name package, any folder containing a file named __init__.py in i, its name is the name of the folder. When a module named module1 is imported, the interpreter first searches for a built-in module with that name. If not found, it then searches for a file named module1.py or a folder named module1 in a list of directories given by the variable sys.path sys.path is initialized from 3 locations: the directory containing the input script, or the current directory PYTHONPATH the installation-dependent default if using export PYTHONPATH directly, it works. but once defined in ~/.bashrc it doesn’t actually triggered in conda env.it simpler to add the root directory of the project to the PYTHONPATH environment variable and then running all the scripts from that directory’s level and changing the import statements accordingly. import search for your packages in specific places, listed in sys.path. and The current directory is always appended to this list redis servicethe common error: redis.exceptions.ConnectionError: Error 111 connecting to 10.20.181.132:6379. Connection refused. , which basically means the system can’t connect to redis server, due to by default redis only allow localhost to access. so we need configure non-localhost IP to access redis db. check redis-server running status 123ps aux | grep redis-servernetstat -tunple | grep 6379redis-cli info shutdown redis-server 1sudo kill -9 $pid redis-server &amp; redis-cliredis-server start redis server with a default config file at /etc/redis/redis.config a few item in the configure file need take care: bind, check here the default setting is to bind 127.0.0.1,which means redis db is stored and only can be access through localhost. for our case, to allow hostIP(10.20.181.132), or even any IP to access, need : 1bind 0.0.0.0 redislog, default place at /var/log/redis/redis-server.log requirepass, for security issues, please consider this item login client with hostIP 1redis-cli -h 10.20.181.132 basic operation of redis-cli log in redis-cli first, then run the following: 1234LPUSH your_list_name item1 LPUSH your_list_name item2 LLEN your_list_nameEXISTS your_list_name redis service in dockerthe following is an example from create a redis service connect to the redis container directly 1docker run -it redis-image /usr/bin/redis-server /etc/redis/myconfig.conf in this way, redis service will use its docker VIP, which can be checked from: 12docker ps docker inspect &lt;container_id&gt; which will give somehing like: 123"bridge": &#123; "Gateway": "172.17.0.1", "IPAddress": "172.17.0.2", then the redis-server can connect by: 1redis-cli -h 172.17.0.2 connect to the host os 1docker run -it -p 6379 redis-image /usr/bin/redis-server /etc/redis/myconfig.conf the redis container has exported 6379, which may map to another port on host os, check: 123docker ps docker port &lt;container_id&gt; 6379 #gives the &lt;exernal_port&gt; on hostrdis-cli -h 10.20.181.132 -p &lt;external_port&gt; run redis service with host network 1docker run -it --network=host redis-image /usr/bin/redis-server /etc/redis/myconfig.conf in this way, there is no bridge network, or docker VIP. the host IP and port is directly used. so the following works 1redis-cli -h 10.20.181.132 -p 6379 A good way now, is to map host redis_port to container redis_port, and use the second way to access redis. 1docker run -it -p 6379:6379 redisjq /bin/bash tips, need to confirm 6379 port at host machine is free. share volumes among multi volumesthe problem is redisjq service download all scenarios scripts in its own docker container, and only store the scenario name in redis db. when redis_worker access the db, there is no real python scripts. so need to share this job-queue to all redis_workers mount volume 1docker run -it -p 6379:6379 --mount source=jq-vol,target=/job_queue redisjq /bin/bash start pythonapi to access the shared volume1docker run -it --mount source=jq-vol,target=/pythonAPI/job_queue redispythonapi /bin/bash referqemu/kvm &amp; ceph: rbd drver in qemu 基于 Ceph RBD 实现 Docker 集群的分布式存储 rexray/rbd 参考 access cephFS inside docker container without mounting cephFS in host how to use folders in s3 bucket the definitive guide to python import statements]]></content>
      <tags>
        <tag>redis</tag>
        <tag>swarm</tag>
      </tags>
  </entry>
  <entry>
    <title><![CDATA[play with k8s]]></title>
    <url>%2F2020%2F04%2F09%2Fplay-with-k8s%2F</url>
    <content type="text"><![CDATA[basic k8skubeadm initinit options can be: --apiserver-bind-port int32, by default, port=6443 --config string, can pass in a kubeadm.config file to create a kube master node --node-name string, attach node name --pod-network-cidr string, used to set the IP address range for all Pods. --service-cide string, set service CIDRs, default value is 10.96.0.0/12 --service-dns-domain string, default value is cluster.local --apiserver-advertise-address string, the broadcast listened address by API Server nodes components IP hostname components 192.168.0.1 master kube-apiserver, kube-controller-manager, kube-scheduler, etcd, kubelet, docker, flannel, dashboard 192.168.0.2 worker kubelet, docker, flannel ApiServerwhen first launch Kubelet, it will send the Bootstrapping request to kube-apiserver, which then verify the sent token is matched or not. 12345678910--advertise-address = $&#123;master_ip&#125;--bind-address = $&#123;master_ip&#125; #can't be 127.0.0.1 --insecure-bind-address = $&#123;master_ip&#125;--token-auth-file = /etc/kubernets/token.csv--service-node-port-range=$&#123;NODE_PORT_RANGE&#125; how to configure master node cluster IPit’s the service IP, which is internal, usually expose the service name. the cluse IP default values as following: 12--service-cluster-ip-range=10.254.0.0/16--service-node-port-range=30000-32767 k8s in practice blueKing is a k8s solution from TenCent. here is a quickstart: create a task add agnet for the task run the task &amp; check the sys log create task pipeline (CI/CD) create a new service in k8s create namespace for speical bussiness create serivces, pull images from private registry hub 12kubectl create -f my-nginx-2.yamlkubctl get pods -o wide how external access to k8s pod service ?pod has itw own special IP and a lifecyle period. once the node shutdown, the controller manager can transfer the pod to another node. when multi pods, provides the same service for front-end users, the front end users doesn’t care which pod is exactaly running, then here is the concept of service: service is an abstraction which defines a logical set of Pods and a policy by which to access them service can be defined by yaml, json, the target pod can be define by LabelSeletor. a few ways to expose service: ClusterIP, which is the default way, which only works inside k8s cluster NodePort, which use NAT to provide external access through a special port, the port should be among 8400~9000. then in this way, no matter where exactly the pod is running on, when access *.portID, we can get the serivce. LoadBalancer 123kubectl get serviceskubectl expose your_service --type="NodePort" --port 8889kubctl describe your_service use persistent volume access external sql use volume volume is for persistent, k8s volume is similar like docker volume, working as dictory, when mount a volume to a pod, all containers in that pod can access that volume. EmptyDir hostPath external storage service(aws, azure), k8s can directly use cloud storage as volume, or distributed storage system(ceph): sample 12345678910111213141516171819202122232425262728293031apiVersion: v1kind: Podmetadata: name: using-ebsmetadata: name: using-cephspec: containers: -image: busybox1 name: using-ebs volumeMounts: -mountPath: /test-ebs name: ebs-volume -image: busybox2 name: using-ceph volumeMounts: -name: ceph-volume mountPath: /test-ceph volumes: -name: ebs-volume awsElasticBlockStore: volumeID: &lt;volume_id&gt; fsType: ext4 -name: ceph-volume cephfs: path: /path/in/ceph monitors: "10.20.181.112:6679" secretFile: "/etc/ceph/admin/secret" containers communication in same podfirst, containers in the same pod, the share same network namespace and same iPC namespace, and shared volumes. shared volumes in a pod when one container writes logs or other files to the shared directory, and the other container reads from the shared directory. inter-process communication(IPC) as they share the same IPC namespace, they can communicate with each other using standard ipc, e.g. POSIX shared memory, SystemV semaphores inter-container network communication containers in a pod are accessible via localhost, as they share the same network namespace. for externals, the observable host name is the pod’s name, as containers all have the same IP and port space, so need differnt ports for each container for incoming connections. basically, the external incoming HTTP request to port 80 is forwarded to port 5000 on localhost, in pod, and which is not visiable to external. how two services communicate ads_runner 12345678910111213apiVersion: v1kind: Servicemetadata: name: ads_runnerspec: selector: app: ads tier: api ports: -protocol: TCP port: 5000 nodePort: 30400 type: NodePort if there is a need to autoscale the service, checkk8s autoscale based on the size of queue. redis-job-queue 123456789101112apiVersion: v1kind: Servicemetadata: name: redis-job-queuespec: selector: app: redis tier: broker ports: -portocol: TCP port: 6379 targetPort: [the port exposed by Redis pod] ads_runner can reach Redis by address: redis-server:6379 in the k8s cluster. redis replication has great async mechanism to support multi redis instance simutanously, when need scale the redis service, it is ok to start a few replicas of redis service as well. redis work queuecheck redis task queue: start a storage service(redis) to hold the work queue create a queu, and fill it with messages, each message represents one task to be done start a job that works on tasks from the queue referjimmysong BlueKing configure manage DB k8s: volumes and persistent storage multi-container pods and container communication in k8s k8s doc: communicate between containers in the same pod using a shared volume kubeMQ: k8s message queue broker 3 types of cluster networking in k8s]]></content>
      <tags>
        <tag>k8s</tag>
      </tags>
  </entry>
  <entry>
    <title><![CDATA[redis task queue]]></title>
    <url>%2F2020%2F04%2F09%2Fredis-task-queue%2F</url>
    <content type="text"><![CDATA[backgroundredis is a in-memory database, like SQlite, but the other part of redis, is that it can use to write message queue for task scheduler. job/task scheduler is a popular automatic tool used in cloud/HPC, when need to deal with lots of taskes/jobs. for ADS simulation, a job scheduler can be used to manage the scenarios feed into the simulator. where the existing scenarios are stored as items in redis, and the redis server looks like a shared volume, which can be access by any worker services through redis_host. where simulator can enpacakged in a redis_worker. redisredis commandsredis is database, so there are plenty of commands used to do db operation. e.g. 123456GET, SET #stringHGET, HSET, #hash tableLPUSH, LPOP, BLPOP #listSADD, SPOP, #setPUBLISH, SUBSCRIBE, UNSUBSCRIBE, PUBSUB #pub&amp;sub... redis in pythonthe following is from redis-py api reference get/set string 123r = redis.Redis(host='localhost', port=6379, db=0)r.set('foo', 'bar')r.get('foo') connection pools redis-py uses a connection pool to manage connections to a Redis server. 12pool = redis.ConnectionPool(host='localhost', port=6379, db=0)r = redis.Redis(connection_pool=pool) connections connectionPools manage a set of Connection instances. the default connection is a normal TCP socket based; also it can use UnixDomainSocketConnection, or customized protocol. connection maintain an open socket to Redis server. when these sockets are disconnected, redis-py will raise a ConnectionError to the caller, also redis-py can issue regular health check to assess the liveliness of a connection. parsers parser provides the way to control how response from Redis server are parsed, redis-py has: PythonParser and the default HiredisParser. response callbacks the redis-py has its own client callbacks in RESPONSE_CALLBACKS. custom callbacks can add on per-instance using set_response_callback(command_name, callfn) pub/sub PubSub object can subscribes/unsubscribes to channels or patterns and listens for new messages. 123456r = redis.Redis(...)p = r.pubsub()p.subscribe('my-first-channel', ...)p.psubscribe('my-first-patttern', ...)p.unsubscribe('my-first-channel')p.punsubscribe('my-first-pattern') every message read from a PubSub instance will be a dict with following keys: - type, e.g. subscribe, unsubscribe, psubscribe, message e.t.c - channel - pattern - data redis-py allows to register callback funs to handle published messages. these message handlers take a single argument the message. when a message is read with a message handler, the message is created and passed to the message handler: 123456def a_handler(message): print message['data']p.subscribe(**('my-channel': a_handler&#125;)r.publish('my-channel', 'awesome handler')p.get_message() get_message() get_message() use system’s select module to quickly poll the connection’s socket. if there’s data available to be read, get_message() will read it; if there’s no data to be read, get_message() will immediately return None: 12345while True: message = p.get_message() if message: #do something with the message time.sleep(0.01) listen() listen() is a generator(e.g. yield keyword), which blocks until a message is avaiable. if the app is ok to be blocked until next message avaiable, listen() is an easy way: 12for message in p.listen(): #do something with the message run_in_thread() run an event loop in a separate thread. run_in_thread() returns a thread object, and it is simply a wrapper around get_message(), that runs in a separate thread, essentially creating a tiny non-blocking event loop. since it’s running in a separate thread, there is no way to handle message that aren’t automatically handled with registered message handlers. 1234p.subscribe(**&#123;'my-channel': a_handler&#125;)thread = p.run_in_thread(sleep_time = 0.01)#when need shut downthread.stop() redis task queue in k8sfine parallel processing using work queue is an good example of how to use redis as task queue. first, fill the existing task lists into redis database(server) as a shared volume, then start mutli worker services to the shared volume to get the job to run.]]></content>
      <tags>
        <tag>redis</tag>
      </tags>
  </entry>
  <entry>
    <title><![CDATA[design a data center for ADS applications]]></title>
    <url>%2F2020%2F04%2F06%2Fdesign-a-data-center-for-ADS-applications%2F</url>
    <content type="text"><![CDATA[backgroundA system’s back end can be made up of a number of bare metal servers, data storage facilities, virtual machines, a security mechanism, and services, all built in conformance with a deployment model, and all together responsible for providing a service. points to consider it’s the primary authority and responsibility of the back end to provide a built-in security mechanism, traffic control and protocols. a central server is responsible for managing and running the system, systematically reviewing the traffic and client request to make certain that everything is running smoothly. service viewportbusniess servicesthe most common busniess services or application-as-a-service(aaas) in ADS data center includes: massively scenario simualtion open-loop re-play simulation AI training sensor data analysis test driven algorithms dev platform as a servicebusniess services can be considered as the most abstract level services, paas consider how to support the upper level needs. a few components are obvious: distributed data storage(aws s3, ceph) massively data analysis(hadoop) massively compute nodes(k8s/docker) vdi server pool sql web server for end-user UI IaaSto support PaaS, we need CPUs, GPUs, network, either in physical mode or virtual mode. the common private cloud vendor would suggest a general virtualization layer, to manage all resources in one shot. but there is always an balance between the easy manage and performance lost. for large auto industry OEMs, no doubt easy manage is crucial. so it’s suggested to implement virtual layer(either vmware or customized kvm). if not, I wonder the self-maintainaince will be a disaster in future. security in cloudwho is Cybersecurity professionalthe guy who provide security during the development stages of software systems, networks, and data centers. must make security measures for any information by designing various defensive systems and strategies against intruders. The specialist must create new defensive systems and protocols and report incidents. Granting permissions and privileges to authorized users is also their job. The cybersecurity professional must maintain IT security controls documentation, recognize the security gaps, and prepare an action plan accordingly. Cybersecurity professionals enable security in IT infrastructure, data, edge devices, and networks. Azure Security: best practices control network access At this ring you typically find Firewall policies, Distributed Denial of Service (DDoS) prevention, Intrusion Detection and Intrusion Prevention systems (IDS/IPS), Web Content Filtering, and Vulnerability Management such as Network Anti-Malware, Application Controls, and Antivirus. The second ring is often a Network Security Group (or NSG) applied to the subnet. Network Security Groups allow you to filter network traffic to and from Azure resources in an Azure virtual network. all subnets in an Azure Virtual Network (VNet) can communicate freely. By using a network security group for network access control between subnets, you can establish a different security zone or role for each subnet. As such, all subnets should be associated with a properly configured Network Security Group. With a virtual server, there is a third ring which is a Network Security Group (NSG) applied to virtual machines network interfaces, avoid exposure to the internet with a dedicated WAN connection. Azure offers both site-to-site VPN and ExpressRoute for this purpose. disable remote access (ssh/rdp) disable remote access to vm from internet. ssh/rdp only should be provided over a secure dedicated conenction using Just-In-time(JIT) vm access. the Just-In-Time VM access policy configures at the NSG to lock down the virtual machines remote management ports. When an authorized user requires access to the VM, they will use Just-In-Time VM Access to request access for up to three hours. After the requested time has elapsed, Azure locks the management ports down to help reduce susceptibility to an attack. update vm You need to run antivirus and anti-malware. and requires system updates for VMs hosted in Azure safeguard sensitive data enable encryption shared responsibility aws security: instance level security AWS security groups, provides security at the protocol and port access level, working much the same way as a firewall – contains a set of rules that filter traffic coming in and out of an EC2 instance. os security path management key pairs(public and private key to login EC2 instance) aws security: network ACL and subnets: network level securityaws security: bastion hoststhe connectivity flowing from an end-user to resources on a private subnet through a bastion host: updates to bastion host skip bastion if using Session Manager, to securely connect to private instance in virtual private cloud wthout needing bastion host or key-pairs now push keys for short periods of time and use IAM policies to restrict access as you see fit. This reduces your compliance and audit footprint as well NAT(network address translation) [gateway] instance allows private instance outgoing connectivity to the internet while at the same time blocking inboud traffic from the internet VPC(virtual private cloud) peering aws security: identity and access management(IAM)governs and control user acces to VPC resoruces, it achieves the goal through Users/Group/Roles and Policies. network topology in cloudnetwork topology at firstI would consider the data center network topology and security mechanism from the following four points: internal network topology basically the data center will have an internal network, to connect infrastructures, e.g. data storage nodes, k8s compute nodes, hadoop compute nodes, web server nodes, vdi server nodes e.t.c. network gateway to end-users there should be a unique network gateway in data center, which is the only network IO for end-user access. network gateway to other private/public cloud we also need connect to other IT infrastructure, so needanother network gateway. the two gateways above, can be either virtual gateway or physical gateway, depends on our hardware. e.g. vlan, bridge, or physical gateway admin pass-through network network gateway is the normal user access port, but for admin, specially for sysm management, trouble-shooting, e.t.c, we need a pass-through network, which basically directly connect to internal network of data center,admin pass-through network is speed limited, so only for special admin usage. h3c: 浅谈数据中心网络架构的发展 接入层，用于连接所有的计算节点，在较大规模的数据中心中，通常以柜顶交换机的形式存在； 汇聚层，用于接入层的互联，并作为该汇聚区域二三层的边界，同时各种防火墙、负载均衡等业务也部署于此； 核心层，用于汇聚层的的互联，并实现整个数据中心与外部网络的三层通信。 传统的数据中心内，服务器主要用于对外提供业务访问，不同的业务通过安全分区及VLAN隔离。一个分区通常集中了该业务所需的计算、网络及存储资源，不同的分区之间或者禁止互访，或者经由核心交换通过三层网络交互，数据中心的网络流量大部分集中于南北向. 在这种设计下，不同分区间计算资源无法共享，资源利用率低下的问题越来越突出。通过虚拟化技术、云计算管理技术等，将各个分区间的资源进行池化，实现数据中心内资源的有效利用。而随着这些新技术的兴起和应用，新的业务需求如虚拟机迁移、数据同步、数据备份、协同计算等在数据中心内开始实现部署，数据中心内部东西向流量开始大幅度增加。 h3c: two-layer network arch 网络三层互联，或称为，数据中心前端网络互联。“前端网络”，是指数据中心面向企业园区的出口。不同数据中心的前端网络通过ip实现互联，园区或分支的客户端通过前端网络访问各数据中心。 网络两层互联，或称为，数据中心服务器网络互联。在不同数据中心服务器网络接入层，构建一个跨数据中心的搭二层网络(vlan)，以满足服务器集群或虚拟机动态迁移等场景 san互联，也称为，后端存储网络互联。借助传输技术，实现主中心、灾备中心间磁盘阵列的数据复制 二层互联的业务需求：保证服务器的高可用集群。 二层互联设计要点：面对中小企业客户（ip网络） openstack neutron: network for cloud data flow in data center: manage(API) network, basically the internal managed network user network external network, including vpn, firewall storage network, connect from computing nodes to storage ndoes NSX arch the left most part is computing nodes, for customers bussniess, the dataflow includes: user-network, storage, internal-network, all of which requires 3 NIC the middle part is infrastructure, including managing nodes, shared storage, which brings IP based storage for the left most part. the right edge part, is external(internet) network services for users, including network to users, as well as network to Internet, firewall, public IP address translation e.t.c. ceph subnets when consider ceph storage for k8s computing nodes, the public network also includes network switches to k8s, as well as normal user data access network switches. cluster network (Gb NIC, including osd and monitor) ceph client(to k8s) network (gb) ceph admin/user network(mb) k8s subnetsunderstand k8s network：pods, service, ingress pods, all containers in one pod, share the same network namespace. the network namespace of pod is different from that of the host machine, but the two is connected by docker bridge services, handle the load balance among pods, as well as encapsulate the IPs of pods, so we don’t directly deal with the local dynamic IPs of pods. how to access k8s service from exteral, or how user acces the k8s hosted in a remote data center? ingress for k8s flannel setup a two-layer network, the ip of pod is assigend by flannel, each node will has a flannel0 virtual NIC, used for node-node communication. gpu vdi subnetsas mentioned in gpu vdi, most solution has a customized vdi client, the vm manager internal network is handled by the vendors, maybe communicate through one Gb NIC, the client-server is simple TCP/ip. webserver subnetsa few things in mind: using vm. web servers are better to deploy in vm, so whenever there is a hardware failure, it can detect and automatically transfer to another vm. communicate with in-house services in ADS data center, most web application need access data from either sql or even storage services directly, which means the web server need both external ingress services as well as handling internal IP access. network managerthe above subnet classification only consider each component itself, for a data center in whole, it’s better to manage all networks of internal subnets and access to external Internet in one module: network manager. as mentioned in the previous section: security in cloud, the network manager module can further add some security mechanisms. referecloud arch: front end &amp; back end Microsoft Azure security tech training courses introduction the Foundation certificate in Cyber Security datacenter network: topology and routing miniNet for data center network topology openstack neutron: two-layer network]]></content>
      <tags>
        <tag>data center</tag>
      </tags>
  </entry>
  <entry>
    <title><![CDATA[gpu vdi]]></title>
    <url>%2F2020%2F04%2F02%2Fgpu-vdi%2F</url>
    <content type="text"><![CDATA[backgroudpreivously, reviewed: hypervisor and gpu virtualization vmware introduction this blog is a little bit futher of vmware and AMD GPU virtualization sln. AMD virtualizationS7150x2for remote graphic workstation, usually we separate host machine and local machines, where host machine is located in data center, and local machines are the end-user terminals at offices. the host OS can be Windows 7/8, Linux, and hypervisor can be vmware ESXi 6.0; guest os can be windows7/8, supported API includes: DX11.1, OpenGL since S7150 has no local IO, os there is no display interface, just like a Nvidia Tesla GPU. SR-IOV sr-iov arch Physical Function (PF) it’s PCI-Express function of a network adapter that supports single root I/O virtualization(SR-IOV) interface. PF is exposed as a virtual network adapter(vLan) in the host OS, and the GPU driver in install in PF. Virtual Function (VF) it’s a lightweight PCIe function on a network adapter that supports SR-IOV. VF is associated with the PF on the network adapter, and represents a virtualized instance of the network adapter. each VF has its own PCI configuration space, and shares one or more physical resources(e.g. GPU) on the network adapter. GPU SR-IOV sr-iov basically split one PF(a PCIe resource) into multi VF(virtual PCIe resource). and each vf has its own Bus/Slot/Function id, which can used to access physical device/resources(e.g. GPU); Nvidia Grid vGPU is a different mechanism, where virtualization is implemented only in host machine side to assign device MAC address. GPU resource managment display GPU PF mangae the size of frameBuffer to each vf, and display virtualization. security check PF also do an address audit check and security check VF schedule GPU vf scheduler is similar as CPU process time-split. in a certain time period, the gpu is occupied by a certain vf. Multiuser GPU(MxGPU)AMD MxGPU is the first hardware-based virtualized GPU solution, based on SR-IOV, and allows up to 16 vm per GPU to work remotely. now we see two GPU virtualization solutions: 12* Nvidia Tesla vGPU* AMD SR-VIO MxGPU vGPU is more software-based virtualization, but the performance is a little better; while MxGPU is hardware based. vmware products the license-free products, e.g. vSphere Hypervisor, VMware Remote Console the licensed and 60days-free products, e.g. vSAN, Horizon 7, vSphere vSpherevSphere is the virtualization(hypervisor) layer of vmware products. there are two components: ESXi and vCenter Server. exsi is the core hypervisor, and vcenter is the service to mange multi vm in a network and host resources pool. install and setupa few steps including: install ESXi on at least one host, either interactively or install through vSphere auto deploy, which include vServer. basically, esxi is free, and can be install on system as the hypervisor layer for any future vms. setup esxi, e.g. esxi boot, network settings, direct console, syslog server for remote logging deploy or install vCenter and services controller Horizon client devices Horizon client the client software for accesing remote desktops and apps, which will run on client devices. after logging in, users select from a list of remote desktops and apps that they are authorized to use. and admin can configure Horizon client to allow end users to select a display protocol. Horizon agent it’s installed on all vms, physical machines, storage server that used as sources for remote desktops and apps. if the remote desktop source is a vm, then first need install Horizon Agent service on that vm, and use the vm as a tepmplate, when create a pool from this vm, the agent is automatically installed on every remote desktop. Horizon admin used to configure Horizon connection server, deploy and manage remote desktops and apps, control user authentication e.t.c. Horizon connection server serve as a broker for client connections. a rich user experience usb devices with remote desktops and apps basically can configure the ability to use USB devices from virtual desktop real-time video for webcams basically can use local client(end-user terminal)’s webcam or microphone in a remote desktop or published app. 3d graphics with Blast or PCoIP display protocol enable remote desktop users to run 3D apps, e.g. google earch, CAD. vSphere 6.0+ supports NVIDIA vGPU, basically share GPU among vms, as well as support amd GPU by vDAG, basically share gpu by making GPU appear as multiple PCI passthrough devies. desktop or app poolfirst create one vm as a base image, then Horizon7 can generate a pool of remote desktops from the base image. similar for apps. the benefit of desktop pool, if using vSphere vm as the base, is to automate the process of making as many identical virtual desktops as need, and the pool has manage tools to set or deploy apps to all virtual desktops in the same pool. for user assignment, either dedicated-assignment pool, which means each user is assigned a particular remote desktop adn returns to the same v-desktop at each login. it’s a one-to-one desktop-to-user relationship; or floating-assignment pool, basically users can shift to any v-desktop in the pool. security features Horizon Client and Horizon Administrator communicate with a Horizon Connection Server host over secure HTTPS connections. integrate two-factor authentication for user login restrict remote desktop access by matching tags in v-desktop pool, but further restriction need design network topology to force certain clients to connect through. referevmware product lists amd S7150 review GPU SR-IOV windows driver tech: PF &amp; VF vSphere install and setup doc horizon7 install and setup doc]]></content>
      <tags>
        <tag>vdi</tag>
        <tag>gpu</tag>
      </tags>
  </entry>
  <entry>
    <title><![CDATA[vmware introduction]]></title>
    <url>%2F2020%2F03%2F28%2Fvmware-virtualization%2F</url>
    <content type="text"><![CDATA[backgroundsooner or later, we are in the direction of gpu virtualization, previously hypervisor and gpu virtual is the first blog. recently, I’d went through openstack/kvm, vnc and now vmware. there is no doubt, any licensed product is not my prefer, but on the hand, it’s more expensive to hire an openstack engineer, compared to pay vmware. vmware is not a windows-only, I had to throw my old mind first. the basic or back-end idea looks very similar to kvm. anyway, the core is hypervisor layer. once get understand one type of virtualization, it’s really easy to understand another. VMware ESXiESXi hypervisor is a Type1 or “bare metal” hypervisor, also called vmware hypervisor, is a thin layer of software that interacts with the underlying resources of a physical computer(host machine), and allocates those resources to other os(guest machine).and can support remotely access, just like kvm. check vSphere doc about how to set BIOS and manage ESXi remotely. BIOS boot configuration can be setted by configuring the boot order in BIOS during startup or by selecting a boot device from the boot device selection menu. the system BIOS has two options. One is for the boot sequence (floppy, CD-ROM, hard disk) and another for the hard disk boot order (USB key, local hard disk). VMware workstationworkstation support multi guest os in a single host os(either windows or Linux), is a Type2 hypervisor, run as an app on host OS. one limitation of workstation is it only works in local host, can’t access remotely. free version, workstation player licensed version, workstation prof in one word, workstation is good enough to multiple hardware usage, but not useful if remote access is required VMware vSpherethe arch of vSphere has three layers: virtualization layer management layer interface layer(web, sdk, cli, virtual console) ESXi is the core hypervisor for VMware products, and also is the core of the vSphere package, the other two is: vSphere client and vSphere server. vSphere server is enterprise-oriented, which is run based on ESXi layer. vSphere client is a client console/terminal. free version: vSphere hypervisor licensed fee version : vSphere with vServer nowadays there are no limitations on Physical CPU or RAM for Free ESXi. here 12345Specifications: Number of cores per physical CPU: No limitNumber of physical CPUs per host: No limitNumber of logical CPUs per host: 480Maximum vCPUs per virtual machine: 8 virtual desktop integration (VDI)VDI virtualize a desktop OS on a server. VDI offers centralized desktop management. the vmware tool is VMware Horizon HorizonVMware Horizon can run VDI and apps in IT data center, and make VDI and apps as services to users. Horizon auto manage VDI and apps by simple setup configure files, then deliver apps or data from data center to end-user. the modules in Horizon is extended-able and plug-in available: physical layer, virtualization layer, desktop resource layer, app resource layer and user access. Horizon basically delivers desktops and apps as a service. there are three versions: Horizon standard, a simple VDI. Horizon advanced, can deliver desktops and apps through a unified workspace Horizon enterprise, with a closed-loop management adn automation the Horizon 7 has new features: Blast extrem display protocol instant clone provisioning vm app volumes app delivery user env manager integrated into remote desktop session host(RDSH) sessions gpu support in vmwarefor vSphere, PCI passthrough can be used with any vSphere, including free vSphere Hypervisor. the only limitation is HW, may not supprot virtual well. GPU remotely accessible is our first-priority concern. but vmware recommend their Horzion with NV’s vGPUwhich has better flexibility and scalability. Horizon support vGPU, the user must install the appropriate vendor driver on the guest vm, all graphics commands are passed directly to GPU without having to be translated by the hypervisor. a vSphere Installation Bundle(VIB) is installed, which aids or perform the scheduling. depending on the card, up to 24 vm can share a GPU. most NV’s GPU which has vGPU feature can support. on the other hand, the price of vGPU products(e.g. T4, M10, P6, V100, RTX8000 e.t.c) are 5 ~ 10 times higher than normal customer GPUs. e.g. GeFS not 5~10 times better. and the license fee for vgpu is horrible. however, most enterprise still choice the Horzion and vGPU solution, even with this high cost. VMware compatibility guide GPU VDI service in cloud tencent gpu cloud the gpu types for video decoding is Tesla P4, for AI and HPC is Tesla v100, and for image workstation(VDI) is an AMD S7150. ali gpu cloud desktop the product is call GA1(S7150), which is specially for cloud desktop. s7150x2 MxGPU with Horizon 7.5 vnc vs vmvirtual network computing (vnc), applications running on one computer but displaying their windows on another. VNC provides remote control of a computer at some other location, any resources that are avaiable at the remote computer are available. vpn simply connect you to a remote network. no doubt, vm is much heavier than vnc. check this blog for compartion from vdi(a vm app) to vnc. vnc can’t tell if the remote is a physical server or a virtual server. come to our user case, we need about 100 separated user space, so virtualization provide better HW efficient and security, compared to deploy a single bare metal OS on the physical machine. there are a few Linux based vnc client/server, e.g. vncviewer CLI, as well as which supports OpenGL well, which helps to support better GPU usage. virtualGL x11vnc refeean essential vmware introduction from IBM what are vsphere, esxi, vcenter in Chienese vSphere Hypervisor vmware vsphere doc how to enable nvidia gpu in passthrough mode on vSphere nvidia vgpu for vmware release notes how to enable vmware vm for GPU pass-through openstack PCI passthrough how can openGL graphics be displayed remotely using VNC vmware Horizon introduction in Chinese vmware ESXi 7.0 U1 test env build up 云游戏能接盘矿卡市场吗 王哥哥的博客 企业存储的博客 in-depth: nv grid vGPU with vmware horizon 6.1 nvidia gpus recommended for virtualization GPU SRIOV and AMD S7150]]></content>
      <tags>
        <tag>vmware</tag>
      </tags>
  </entry>
  <entry>
    <title><![CDATA[kvm in linux (2)]]></title>
    <url>%2F2020%2F03%2F27%2Fkvm-in-linux-2%2F</url>
    <content type="text"><![CDATA[history of terminalin history, computer is so huge and hosted at a big room, while the operator stay in another office room with another small machine as terminal to the host computer. as the host computer can run multi-users, each of which needs a terminal. when times goes to personal computer(PC), there is no need for multi-users login, so each PC has integrated with I/O device(monitor and keybord) nowadays, we have plenty of end-user terminals, e.g. smart phone, smart watch, Ipad e.t.c, all of these are terminal, as the real host computer is in cloud now. in a word, any device that can accept input and display output is a terminal, which play the role as the human-machine interface. three kinds of terminalssh is TCP/IP protocol, what’s inside is the remote terminal streaming flow. the TCP/IP plays as the tunnel. of course, any communication protocol can play the role as the tunnel. local terminal, with usb to keyboard and monitor. serial terminal, the guest machine connect to the host machine, who has keyboard and monitor. basically the guest machine borrow the host machine’s IO, which needs the host machine to run a terminal elmulator. tcp/ip tunnel terminal, e.g. ssh both local terminal and serial terminal, directly connect to physical device, e.g. VGA interface, usb, serial e.t.c, so both are called physical terminal. ssh has nothing to do with physical device. ttyin Linux, /dev/ttyX represents a physical-terminal. from tty1 to tty63 are all local terminal. during Linux kernal init, 63 local terminals are generated, which can be switched by Fn-Alt-Fx, (x can be 1, 2, 3…). each current terminal is called focus terminal. focus terminal is taken as global variable, so any input will transfer to current focus terminal. for serial, there is no focus terminal. in Linux, /dev/console represents current focus terminal, consider as a pointer to current focus terminal, wherever write sth to /dev/console will present at current focus terminal. /dev/tty is the global variable itself. whichever terminal you are working with, when write to /dev/tty, it will be present. /dev/ttyS#num# represents serial terminal. getty &amp; loginin multi-terminal time, each terminal must bind to one user, user must first login the terminal, getty is the login process, which is called in init. after login successfully, the terminal tty#num# can be owned by the user. there are a few differenet version of getty, e.g. agetty e.t.c. pty &amp; ptspty stands for pseudo-tty, pty is a (master-slave) pair terminal device, including: pts pseudo-terminal slave, and ptmx pseudo-terminal master. a few concepts as following: serial terminal /dev/ttySn pseudo-termianl /dev/pty/ controlling terminal /dev/tty console terminal /dev/console Linux Serial Consoleserial communicationin old-style PC, serial is COM interface, also called DB9 interface, with RS-232 standard. each user can connect to host machine through a terminal. console is same as terminal to connect user to host machine, but console is higher priority than terminal. nowadays less difference between terminal and console. Teletype is the earliest terminal device, tty is physical or pseudo terminal connect to the host machine, nowadays tty also used for serial device. serial is the connection from terminal/keyboard to dev-board. 1ls /dev/tty* configurationLinux kernel must be configured to use the serial port as its console, which is done by passing the kernel the console parameter when the kernel is started by the boot loader. the init system should keep a process runnign to monitor the serial console for logins, the monitoring process is traditionally named getty a number of system utilities need to be configured tomake them aware of the console, or configured to prevent them from disrupting the console. serial portLinux names as the first serial port has the file name /dev/ttys0, the second serial port has the file name /dev/ttyS1 and so on. most boot loaders have yet another naming scheme, the first serial port is numbered 0, the second serial port is numbered 1 configure GRUB boot loaderconfigure GRUB to use the serial console 1234info grub/boot/grub/grub.cfgserial --unit=0 --speed=9600 --word=8 --parity=no --stop=1 terminal serial init systemgetty is started by init: 1co:2345:respawn:/sbin/getty ttyS0 CON9600 vt102 co is an arbitrary entry, representing console 2345, run levels where this entry gets started. respawn, re-run the program if it dies /sbin/getty ttyS0 CON9600 vt102, getty connecting to /dev/ttyS0 with the settings for CON9600bps, and the terminal is VT100 model virsh consolehow to connect ubuntu kvm virtual machine through serial console. in earlier version and distributions, it need to configure serial console in grub file, but in Ubuntu it’s very easy adn reliable as most of configurations and settings are already configured in OS. setupruns ubuntu14.04 guest mahcine on ubuntu 16.04 host machine. how to setup serial console, we have to connect guest machine and login on as root user login through SSH connect on KVM guest machine through ssh from host machine 12ssh 192.168.122.1hostname connect through VNCconenct guest machine through VNC viewer and setup serial console. There are times when we need to troubleshoot Virtual Machines with unknown status like Hang in between, IP address issues, password problems, Serial console Hang etc. In case scenarios, we could relay on VNC configuration of KVM Guest Machines. vnc viewer is a graphic viewer, so only need add graphics component in config.xml: 1&lt;graphics type='vnc' port='-1' autoport='yes' passwd='mypassword'/&gt; run virsh vncdisplay #vm_name# we can get our vnc (server) IP, which then can be accessed by vnc guest viewer. here, kvm virtual machine has implemented a vnc server, and any vnc viewer in the same physical machine, can access this vnc server, even without external networking. configure serial console in ubuntu guestafter getting login console, we can start serial console and enable it with: 123# systemctl start serial-getty@ttyS0# systemctl enable serial-getty@ttyS0Created symlink /etc/systemd/system/getty.target.wants/serial-getty@ttyS0.service → /lib/systemd/system/serial-getty@.service. now we can connect serial console with virsh console: 1virsh console vm_name after installnation, reboot first, then the physical machine has dual-OS: GuestOS and HostOS, which can exit GuestOS by Ctrl + ], or login GuestOS by virsh consoel #guest_vm# in summary, virsh console implement a serial console for kvm guest machine, which connect the guest machine to host machine through serial, which is not a ssh, need new knowledges about serial. virsh console hangsvirsh console vm hangs at: Escape character is ^], which can exit by ctrl + ] sol1:go to guest machine/terminal, and edit /etc/default/grub, appending; 12GRUB_TERMINAL=serialGRUB_SERIAL_COMMAND="serial --unit=0 --speed=115200 --word=8 --parity=no --stop=1" then execute: 12update-grubreboot the problem here, as the KVM vm shares the kernel of the host machine, if update grub, the host machine will reboot with serial connection(?) Centos virsh console hangs go to /etc/securetty and append ttyS0 append S0:12345:respawn:/sbin/agetty ttyS0 115200 to /etc/init/ttyS0.conf /etc/grub.conf(Centos) I only found /boot/grub/grub.cfg in ubuntu. In the kernel line, appending console=ttyS0. but there is no kernel line in ubuntu grub.cfg. Ubuntu virsh console hangs123systemctl disable systemd-networkd-wait-onlinesystemctl enable serial-getty@ttyS0.servicesystemctl start serial-getty@ttyS0.service which gives: 12Mar 27 11:17:18 ubuntu systemd[1]: Started Serial Getty on ttyS0.Mar 27 11:17:18 ubuntu agetty[445120]: /dev/ttyS0: not a tty check in /dev/ttyS* : 123crw-rw---- 1 root dialout 4, 73 Mar 24 09:20 ttyS9crw--w---- 1 root tty 4, 64 Mar 24 09:20 ttyS0crw-rw---- 1 root dialout 4, 65 Mar 24 09:20 ttyS1 interesting here, ttyS0 belongs to tty group, all otehr ttyS#num# belongs to dialout group. tty and dialoutchange /dev/ttyS0 to tty group can’t access /dev/ttyS add USER to tty/dialout group 12sudo usermod -a -G tty $USERsudo usermod -a -G dialout $USER reboot and go on referremote serial console HOWTO remote serial console HOWTO in Chinese understand Linux terminal history in Chinese Linux terminal introduction in Chinese serial communication in Chinese arhLinux: working with the serial console gnu org: grub geekpills: start vnc remote access for guest operating systems]]></content>
      <tags>
        <tag>kvm</tag>
        <tag>linux</tag>
      </tags>
  </entry>
  <entry>
    <title><![CDATA[kvm/libvert in linux (1)]]></title>
    <url>%2F2020%2F03%2F24%2Fkvm-libvert-in-linux-1%2F</url>
    <content type="text"><![CDATA[kvm backgroundkernel based virtual machine(KVM), is a Linux kernel module, which transfer Linux to a Hypervisor, which depends on the ability of hardware virtualization. usually the physical machine is called Host, and the virtual machine(VM) run in host is called Guest. kvm itself doesn’t do any hardware emulator, which needs guest space to set an address space through dev/kvm interface, to which provides virtual I/O, e.g. QEMU. virt-manager is a GUI tool for managing virtual machines via libvirt, mostly used by QEMU/KVM virtual machines. check kvm model info 1modinfo kvm whether CPU support hardware virtualization 12egrep -c '(vmx|svm)' /proc/cpuinfokvm-ok install kvm install libvirt and qemu packages 1234sudo apt install qemu qemu-kvm libvirt-bin bridge-utilsmodprobe kvm #load kvm modulesystemctl start libvirtd.service #vrish iface-bridge ens33 virbr0 #create a bridge network mac address add current user to libvirtd group 123sudo usermod -aG libvirtd $(whoami)sudo usermod -aG libvirt-qemu $(whoami)sudo reboot network in kvmdefault network is NAT(network address transation), when you create a new virtual machine, this forwards network traffic through your host system; if the host is connected to the Internet, then your vm have Internet access. VM manager also creates an Ethernet bridge between the host and virtual network, so can ping IP address of VM from host, also ok on the other way. List of network cards go to /sys/class/net there are a few nic: 123456789lrwxrwxrwx 1 root root 0 Mar 24 16:18 docker0 -&gt; ../../devices/virtual/net/docker0lrwxrwxrwx 1 root root 0 Mar 24 16:18 docker_gwbridge -&gt; ../../devices/virtual/net/docker_gwbridgelrwxrwxrwx 1 root root 0 Mar 24 16:18 eno1 -&gt; ../../devices/pci0000:00/0000:00:1f.6/net/eno1lrwxrwxrwx 1 root root 0 Mar 24 16:18 enp4s0f2 -&gt; ../../devices/pci0000:00/0000:00:1c.4/0000:02:00.0/0000:03:03.0/0000:04:00.2/net/enp4s0f2lrwxrwxrwx 1 root root 0 Mar 24 16:18 lo -&gt; ../../devices/virtual/net/lolrwxrwxrwx 1 root root 0 Mar 24 16:18 veth1757da9 -&gt; ../../devices/virtual/net/veth1757da9lrwxrwxrwx 1 root root 0 Mar 24 16:18 vethd4d0e7f -&gt; ../../devices/virtual/net/vethd4d0e7flrwxrwxrwx 1 root root 0 Mar 24 16:18 virbr0 -&gt; ../../devices/virtual/net/virbr0lrwxrwxrwx 1 root root 0 Mar 24 16:18 virbr0-nic -&gt; ../../devices/virtual/net/virbr0-nic multi interfaces on same MAC addresss when a switch receives a frame from an interface, it creates an entry in the mac-address table with the source mac and interface. it the source mac is known, it will update the table with the new interface. so bascially if you assign the mac address of an external-network-avialable NIC-A to the special vm, NIC-A is lost. virbr0 the default bridge NIC of libvirt is virbr0. bridge network means the guest and host share the same physical Network Cards, as well as offer the guest a special IP, which can be used to access the guest directly. the virbr0 do network address translation(NAT), basically transfer the internal IP address to an external IP address, which means the internal IP address is un-visiable from outside. to add the virbr0, when it is deleted previously: 1234brctl addbr virbr0brctl stp virbr0 onbrctl setf virbr0 0ifconfig virbr0 192.168.122.1 netmask 255.255.255.0 up to disable or delete virbr0: 1234virsh net-destroy defaultvirsh net-undefine defaultservice libvirtd restartifconfig after starting the vm, can check the bridge network by: 12virsh domiflist vm-namevirsh domifaddr vm-name and we can login the vm, (after we assign current user to libvert group), and check NAT is working: 123ssh 192.168.122.1 ping www.bing.comping 10.20.xxx.xxx # ping the host external IP basically the vm can access external website, but external web can’t access vm_name. 1attach-interface/detach-interface/domiflist create vmcreate a virtual machine, can be done either through virt-install or config.xml: virt-installvirt-install has depends on system python, pip. if current ptyhon version is 2.7, it gives warnning and return -1 due to unfound module. so make sure the #PYTHONPATH# point to the correct path if you have multi python in system. and virt-install has to run with root. then can start a virtual machine with following command options) 1234567891011121314sudo virt-install \--name v1 \--ram 2048 \# --cdrom=ubuntu-16.04.3-server-amd64.iso \--disk path=/var/lib/libvirt/images/ubuntu.qcow2 \--vcpus 2 \--virt-type kvm \--os-type linux \--os-variant ubuntu16.04 \--graphics none \--console pty, target_type=serial \--location /var/lib/libvirt/images/ubuntu-16.04.3-server-amd64.iso \--network bridge:virbr0 \ --extra-args console=ttyS0 during the installation, the process looks very much like Linux installation on a bare machine. I suppose this way, it’s like install a dual-OS in the bare machine. during the installation, there is an error failed to load installer component libc6-udeb, it’s may due to the iso or img has missing component. config.xml create volumes go to /var/lib/libvirt/images, and create volume as following: 1qemu-img create -f qcow2 ubuntu.qcow2 40G check qemu-kvm &amp; qemu-img introduction add vm image cp ubuntu.iso to /var/lib/libvirt/images as well: 12ubuntu.qcow2ubuntu-16.04.3-server-amd64.iso vm.xmlfollow an xml sample: 12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152&lt;domain type='kvm'&gt; &lt;name&gt;v1&lt;/name&gt; &lt;memory&gt;4048576&lt;/memory&gt; &lt;currentMemory&gt;4048576&lt;/currentMemory&gt; &lt;vcpu&gt;2&lt;/vcpu&gt; &lt;os&gt; &lt;type arch='x86_64' machine='pc'&gt;hvm&lt;/type&gt; &lt;boot dev='cdrom'/&gt; &lt;/os&gt; &lt;features&gt; &lt;acpi/&gt; &lt;apic/&gt; &lt;pae/&gt; &lt;/features&gt; &lt;clock offset='localtime'/&gt; &lt;on_poweroff&gt;destroy&lt;/on_poweroff&gt; &lt;on_reboot&gt;restart&lt;/on_reboot&gt; &lt;on_crash&gt;destroy&lt;/on_crash&gt; &lt;serial type='pty'&gt; &lt;target port='0' /&gt; &lt;/serial&gt; &lt;console type='pty' &gt; &lt;target type='serial' port='0' /&gt; &lt;/console&gt; &lt;devices&gt; &lt;emulator&gt;/usr/bin/qemu-system-x86_64&lt;/emulator&gt; &lt;disk type='file' device='disk'&gt; &lt;driver name='qemu' type='qcow2'/&gt; &lt;source file='/var/lib/libvirt/images/ubuntu.qcow2'/&gt; &lt;target dev='hda' bus='ide'/&gt; &lt;/disk&gt; &lt;disk type='file' device='cdrom'&gt; &lt;source file='/var/lib/libvirt/images/ubuntu-16.04.3-server-amd64.iso'/&gt; &lt;target dev='hdb' bus='ide'/&gt; &lt;/disk&gt; &lt;interface type='bridge' &gt; &lt;mac address='52:54:00:98:45:3b' /&gt; &lt;source bridge='virbr0' /&gt; &lt;model type='virtio' /&gt; &lt;/interface&gt; &lt;serial type='pty'&gt; &lt;target port='0' /&gt; &lt;/serial&gt; &lt;console type='pty'&gt; &lt;target type='serial' port='0' /&gt; &lt;/console&gt; &lt;input type='mouse' bus='ps2'/&gt; &lt;graphics type='vnc' port='-1' autoport='no' listen = '0.0.0.0' keymap='en-us'&gt; &lt;listen type='address' address='0.0.0.0' /&gt; &lt;/graphics&gt; &lt;/devices&gt; &lt;/domain&gt; a few tips about the xml above: \ component is necessary for network interface. if not assign a special mac address in the interface. since we had define virbr0, an automatic mac address will be assigned, which is unique from the host machine’s IP, but if ssh login to the guest (ssh username@guest_ip), it actually can ping host machine’s iP or any external ip(www.being.com) \ compoennt, is setting for console. finally run the following CLI to start vm: v1: 123virsh define vm1.xml virsh start ubuntu(the image)virsh list libvertlibvert is a software package to manage vm, including libvirtAPI, libvirtd(daemon process), and virsh tool. 12sudo systemctl restart libvirtdsystemctl status libvirtd only when libvirtd service is running, can we manage vm through libvert. all configure of the vm is stored ad /etc/libvirt/qemu. for virsh there are two mode: immediate way e.g. in host shell virsh list interactive shell e.g. by virsh to virsh shell common virsh commands123456789101112131415161718virsh &lt;command&gt; &lt;domain-id&gt; [options]virsh uri #hypervisor&apos;s URIvirsh hostnamevirsh nodeinfo virsh list (running, idel, paused, shutdown, shutoff, crashed, dying)virsh shutdown &lt;domain&gt;virsh start &lt;domain&gt;virsh destroy &lt;domain&gt;virsh undefine &lt;domain&gt;virsh create #through config.xmlvirsh connect #reconnect to hypervisor virsh nodeinfo virsh define #file domainvirsh setmem domain-id kbs #immediatelyvirsh sertmaxmem domain-id kbsvirsh setvcpus domain-id count virsh vncdisplay domain-id #listed vnc portvirsh console &lt;domain&gt; virsh network commands host configure Every standard libvirt installation provides NAT based connectivity to virtual machines out of the box. This is the so called ‘default virtual network’ 12345virsh net-list virsh net-define /usr/share/libvirt/networks/default.xmlvirsh net-start defaultvirsh net-info defaultvirsh net-dumpxml default When the libvirt default networkis running, you will see an isolated bridge device. This device explicitly does NOT have any physical interfaces added, since it uses NAT + forwarding to connect to outside world. Do not add interfaces. Libvirt will add iptables rules to allow traffic to/from guests attached to the virbr0 device in the INPUT, FORWARD, OUTPUT and POSTROUTING chains. if default.xml is not found, check fix missing default network, default.xml is sth like: 123456789101112&lt;network&gt; &lt;name&gt;default&lt;/name&gt; &lt;uuid&gt;9a05da11-e96b-47f3-8253-a3a482e445f5&lt;/uuid&gt; &lt;forward mode='nat'/&gt; &lt;bridge name='virbr0' stp='on' delay='0'/&gt; &lt;mac address='52:54:00:0a:cd:21'/&gt; &lt;ip address='192.168.122.1' netmask='255.255.255.0'&gt; &lt;dhcp&gt; &lt;range start='192.168.122.2' end='192.168.122.254'/&gt; &lt;/dhcp&gt; &lt;/ip&gt;&lt;/network&gt; then run: 123sudo virsh net-define --file default.xmlsudo virsh net-start defaultsudo virsh net-autostart --network default if bind default to virbr0 already, need delete this brige first. guest configure add the following to guest xml configure: 1234&lt;interface type='network'&gt; &lt;source network='default'/&gt; &lt;mac address='00:16:3e:1a:b3:4a'/&gt;&lt;/interface&gt; more details can check virsh networking doc snapshotssnapshots used to save the state(disk mem, time..) of a domain create a snapshot for a vm 123virsh snapshot-create-as --domain test_vm \--name "test_vm_snapshot1" \ --description "test vm snapshot " list all snapshots for vm 1virsh snapshot-list test_vm display info about a snapshot 1234567virsh snapshot-info --domain test_vm --snapshotname test_vm_snapshot1``` * delete a snapshot ```shvirsh snapshot-delete --domain test_vm --snapshotname test_vm_shapshot1 manage volumes create a storage volume 123virsh vol-create-as default test_vol.qcow2 10G# create test_vol on the deafult storage pooldu -sh /var/lib/libvirt/images/test_vol.qcow2 attach to a vm attache test-vol to vm test 123virsh attach-disk --domain test \--source /var/lib/libvirt/images/test-vol.qcow2 \--persistent --target vdb which can be check that the vm has added a block device /dev/vdb 12ssh test #how to ssh to vm lsblk --output NAME,SIZE,TYPE or directly grow disk image: 1qemu-img resize /var/lib/libvirt/images/test.qcow2 +1G detach from a vm 1virsh detach-disk --domain test --persistent --live --target vdb delete a vm 123virsh vol-delete test_vol.qcow2 --pool defaultvirsh pool-refresh defaultvirsh vol-list default fs virsh commands12virt-ls -l -d &lt;domain&gt; &lt;directory&gt;virt-cat -d &lt;domain&gt; &lt;file_path&gt; referkvm introduction in chinese kvm pre-install checklist Linux network configuration kvm installation official doc creating vm with virt-install install KVM in ubuntu jianshu: kvm network configure cloudman: understand virbr0 virsh command refer qcow2 vs raw]]></content>
      <tags>
        <tag>kvm</tag>
      </tags>
  </entry>
  <entry>
    <title><![CDATA[real-time matlab/simulink code generation]]></title>
    <url>%2F2020%2F03%2F20%2Freal-time-matlab-simulink-code-generation%2F</url>
    <content type="text"><![CDATA[real-time matlab code generationbackgroudPython and Linux vs Matlab and Windows, I prefer the front. but as a teamwork, I have to understand how matlab/Simulink code generation works, especially with real-time model. Simulink Coderpreviously named as Real-Time workshop(rtw). real time model data structureaccess rtModel data by using a set of macros analogous to the ssSetxxx and ssGetxxx macros that S-functions use to access SimStruct data, including noninlined S-functions compiled by the code generator. You need to use the set of macros rtmGetxxx and rtmSetxxx to access the real-time model data structure. The rtModel is an optimized data structure that replaces SimStruct as the top level data structure for a model. The rtmGetxxx and rtmSetxxx macros are used in the generated code as well as from the main.c or main.cpp module. Usage of rtmGetxxx and rtmSetxxx macros is the same as for the ssSetxxx and ssGetxxx versions, except that you replace SimStruct S by real-time model data structure rtM. rtm macro description rtmGetdX(rtm) get the derivatives of block continous states rtmGetNumSampleTimes(RT_MDL rtM) Get the number of sample times that a block has rtmGetSampleTime(RT_MDL rtM, int TID) Get task sample time rtmGetStepSize(RT_MDL) Return the fundamental step size of the model rtmGetT(RT_MDL,t) Get the current simulation time rtmGetErrorStatus(rtm) Get the current error status code generation to used externally :1) Install the Real-Time Workshop (RTW) Toolbox for MATLAB; 2) Create the Simulink Model and Prepare it for autocoding; 3) Correctly configure the RTW options and include a *.tcl file; 4) Build any S-Functions of the model; 5) Build the model (generate autocode including makefile); 6) Tune up the makefile with any missing options/libraries/files; 7) Integrate autocoded model in RTEMS using the wrapper. ert_main()the following is a common sample of real-time model generated C code. 1234567891011rt_OneStep(void)&#123; simulation_custom_step() ;&#125;main()&#123; simulation_initialize(); while(rtmGetErrorStatus(xx_M) == (NULL))&#123; rt_OneStep(); &#125; simulation_terminate(); return 0; if no time interupt setting up, there is a Warning: The simulation will run forever. Generated ERT main won’t simulate model step behavior. To change this behavior select the ‘MAT-file logging’ option. from real-time Matlab/Simulink model to C/C++ code, we need manually set the while-loop break statement. for example, we can run 100 times or based on some event-trigger. Timing12345678 struct &#123; uint16_T clockTick1; //base rate counter (5s) struct &#123; uint16_T TID[2]; &#125; TaskCounters; //subtask counter (0.02s) &#125; Timing; currentTime = Timing.clockTick1 * 5.0 ; absolute timer for sample time: [5.0s, 0.0s]. the resolution of this integer timer is 5.0, which is the step size of the task. so bascially, assuming to run one step need physcially 5s, but inside the module, each internal step is 0.02s. if we run a test scenario with 20s, basically we have 4 clockTick1 and inside each clockTick1, we have 250 times internal steps. a few modificationthe following is a few modification based on the auto-generated C code: redefine data structure matlab coder use most C structure to package signals. in our adas model, most signals have similar inner items, so first I’d like to define a parent structre, then define all other adas signals using typedef: 1234typedef structure parent adas_sig1 ;typedef structure parent adas_sig2 ;typedef structure parent adas_sig3 ;typedef structure parent adas_sig4 ; with the parent struct, we can define one method to handle all adas signals. add trigger model in rt_oneStep() 1234567891011121314checkTriggerSigs(&amp;FunctionSignal, outputs_name); int idx = 0; for(; idx&lt;outputs_name.size(); idx++)&#123; if ( outputs_name[idx] == "adas_sig1") &#123; double *vals = gd1_struc_2arr(adas_ig1) ; outputs_data.push_back(als); &#125; else if( outputs_name[idx] == "adas_sig2") &#123; double *vals = gd1_struc_2arr(adas_sig2) ; outputs_data.push_back(vals); &#125;// ws msg send model add sim_time to break the loop 12345real_T sim_time = 5.0 ; fflush((NULL));while (rtmGetErrorStatus(OpenLoopSimulation_M) == (NULL) &amp;&amp; (Timing.clockTick1) * 5.0 &lt;= sim_time) &#123; rt_OneStep(); &#125; in summarythe work above is the core modifcation to make real-time matlab/simulink model with trigger model translated to C/C++ code, which can be integrated in massively adas test pipeline. refermatlab code generation from rtems the joy of generating C code from MATLAB matlab coder introduction matlab code doc real-time model data structure generate code from rate-based model schedule a subsystem multiple times in a single step]]></content>
  </entry>
  <entry>
    <title><![CDATA[massively adas test pipeline]]></title>
    <url>%2F2020%2F03%2F18%2Fmassively-adas-test-pipeline%2F</url>
    <content type="text"><![CDATA[backgroundADAS engineers in OEM use Matlab/Simulink(model based design) to develop the adas algorithms, e.g. aeb. Simulink way is enough for fuction verification in L2 and below; for L2+ scenario, often need to test these adas functions in system level, basically test it in scenarios as much as possible, kind of L3 requirements. one way to do sytem level verification is through replay, basically the test vehicle fleet can collect large mount of data, then feed in a test pipeline, to check if these adas functions are triggered well or missed. for replay system test, we handle large mount of data, e.g. Pb, so the Simulink model run is too slow. the adas test pipeline with the ability to run massively is required. the previous blog high concurrent ws server mentioned about the archetecture of this massively adas test pipeline: with each adas model integrated with a websocket client, and all of these ws clients talk to a websocekt server, which has api to write database. encode in Cthe adas simulink model can be encode in c, which of course can encode as c++, while not powerful yet. as in C is more scalable then simulink/matlab. matlab/simulink encode has a few choices, massively run is mostly in Linux-like os, here we choose ert env to encode the model, after we can build and test as: 123gcc -c adas_model.c -I . gcc -c ert_main.c gcc ert_main.o adas_model.c -o mytest json-cas all messages in adas model in c is stored in program memory, first thing is to serialize these to json. here we choose json-c: install on local Ubuntu 1234sudo apt-get install autoconf automake libtool sh autogen.sh./configuremake &amp;&amp; make install then the json-c header is at: /usr/local/include/json-c and the libs at: /usr/local/lib/libjson-c.so *.al when using we can add the following flags: 123JSON_C_DIR=/path/to/json_c/installCFLAGS += -I$(JSON_C_DIR)/include/json-cLDFLAGS+= -L$(JSON_C_DIR)/lib -ljson-c the json object can be created as: 12345678910struct json_object *json_obj = json_object_new_object();struct json_object *json_arr = json_object_new_array();struct json_object *json_string = json_object_new_string(name);int i=0;for(; i&lt;20; i++)&#123; struct json_object *json_double = json_object_new_double(vals[i]); json_object_array_put_idx(json_arr, i, json_double);&#125;json_object_object_add(json_obj, "name", json_string);json_object_object_add(json_obj, "signals", json_arr); modern c++ json libs are more pretty, e.g. jsoncpp, rapidJSON, json for modern c++ wsclient-cthe first ws client I tried is: wsclient in c, with default install, can find the headers and libs, separately at: 12/usr/local/include/wsclient/usr/local/lib/libwsclient.so or *.a when using: 1gcc -o client wsclient.c -I/usr/local/include -L/usr/local/lib/ -lwsclient onopen()as we need send custom message through this client, and the default message was sent inside onopen(), I have to add additionaly argument char\* into the default function pointer onopen 123456789101112int onopen(wsclient *c, char* message) &#123; libwsclient_send(c, message); return 0;&#125; void libwsclient_onopen(wsclient *client, int (*cb)(wsclient *c, char* ), char* msg) &#123; pthread_mutex_lock(&amp;client-&gt;lock); client-&gt;onopen = cb; pthread_mutex_unlock(&amp;client-&gt;lock);&#125; if(pthread_create(&amp;client-&gt;handshake_thread, NULL, libwsclient_handshake_thread, (void *)client)) &#123; and the onopen callback is actually excuted inside handshake thread, in which is not legacy to pass char* message. further as there is no global alive status to tell the client-server channel is alive, to call libwsclient_send() in another thread sounds trouble-possible. looks wsclient-c is limited, I transfer to wsclient c++. but need make sure model in c workable with g++. wsclient c++websocketpp is a header only C++ library, there is no libs after built, but it is depends on boost, so to use this lib, we can add header and libs as following: 12/usr/local/include/websocketpp /usr/lib/x86_64-linux-gnu/libboost_*.so I am using the wsclient from sample, and define a public method as the client process: 123456789101112131415161718192021222324int client_process(std::string&amp; server_url, std::string&amp; message)&#123; websocket_endpoint endpoint; int connection_id = endpoint.connect(server_url); if (connection_id != -1) &#123; std::cout &lt;&lt; "&gt; Created connection with id " &lt;&lt; connection_id &lt;&lt; std::endl; &#125; connection_metadata::ptr mtdata = endpoint.get_metadata(connection_id); //TODO: optimized with this sleeping time boost::this_thread::sleep(boost::posix_time::milliseconds(200)); int retry_num = 0; while(mtdata-&gt;get_status() != "Open" &amp;&amp; retry_num &lt; 100)&#123; std::cout &lt;&lt; "&gt; connected is not open " &lt;&lt; connection_id &lt;&lt; std::endl; boost::this_thread::sleep(boost::posix_time::milliseconds(100)); connection_id = endpoint.connect(server_url); mtdata = endpoint.get_metadata(connection_id); &#125; if(mtdata-&gt;get_status() != "Open") &#123; std::cout &lt;&lt; "retry failed, exit -1" &lt;&lt; std::endl ; return 0; &#125; endpoint.send(connection_id, message); std::cout &lt;&lt; message &lt;&lt;" send successfully" &lt;&lt; std::endl ; there is more elegent retry client solution. to build our wsclient: 1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950 g++ wsclient.cpp -o ec -I/usr/local/include -L/usr/lib/x86_64-linux-gnu -lpthread -lboost_system -lboost_random -lboost_thread -lboost_chrono ``` ## ws server in pythonwe implemented a simple ws server with [websockets]():```python#!/usr/bin/env python3import asyncioimport websocketsimport json from db_ model import dbwriter, adas_msg, Baseclass wsdb(object): def __init__(self, host=None, port=None): self.host = host self.port = port self.dbwriter_ = dbwriter() async def process(self, websocket, path): try: raw_ = await websocket.recv() jdata = json.loads(raw_) orm_obj = adas_msg(jdata) try: self.dbwriter_.write(orm_obj) self.dbwriter_.commit() except Exception as e: self.dbwriter_.rollback() print(e) except Exception as e: print(e) greeting = "hello from server" await websocket.send(greeting) print(f"&gt; &#123;greeting&#125;") def run(self): if self.host and self.port : start_server = websockets.serve(self.process, self.host, self.port) else: start_server = websockets.serve(self.process, "localhost", 8867) asyncio.get_event_loop().run_until_complete(start_server) asyncio.get_event_loop().run_forever() if __name__=="__main__": test1 = wsdb() test1.run() the simple orm db_writer is from sqlalchemy model. in makefile12345678910111213141516171819202122CC=g++JSONC_IDIR=/usr/local/includeCFLAGS=-I. -I$(JSONC_IDIR)OPEN_LOOP_DEPS= rtwtypes.h adas_test.h LDIR=-L/usr/local/lib/LIBS=-ljson-cBOOST_LDIR=-L/usr/lib/x86_64-linux-gnuBOOST_LIBS= -pthread -lboost_system -lboost_random -lboost_thread -lboost_chronoJSON_DEPS= wsclientpp.hobj = adas_test.o ert_main.o src=*.c$(obj): $(src) $(DEPS) $(JSON_DEPS) $(CC) -c $(src) $(CFLAGS)mytest: $(obj) $(CC) -o mytest $(obj) $(CFLAGS) $(LDIR) $(LIBS) $(BOOST_LDIR) $(BOOST_LIBS).PHONY: cleanclean: rm -f *.o so now we have encode adas simulink model to C code, and integrate this model C with a websocket client, which can talk to a ws server, which further write to database, which further can be used an data analysis model. we can add front-end web UI and system monitor UI if needed, but so far this adas test pipeline can support a few hundred adas test cores to run concurrently. refer pthread_create with multi args wscpp retry client]]></content>
      <tags>
        <tag>adas</tag>
      </tags>
  </entry>
  <entry>
    <title><![CDATA[stanford cs linked list problems]]></title>
    <url>%2F2020%2F03%2F14%2Fstanford-cs-linked-list-problems%2F</url>
    <content type="text"><![CDATA[backgroundstanford cs gives me some confidence to deal with linked list. linked list basicsbasic pointers as well as previous blog: a pointer stores a reference to another variable(pointee). what stored inside pointer is an reference to its pointee’s type. NULL pointer, points to nothing. pointer assignment p=q, makes the two pointers point to the same pointee, namely the two pointers point to the same pointee memory. it’s a good habit to remember to check the empty list case. define the Linked-List structure: 123456struct ListNode &#123; int val ; ListNode *next; ListNode(int x): val(x), next(NULL)&#123;&#125;&#125;;typedef ListNode node_ ; iterate over the list with a local pointer 123456node_ *current = head ;while(current)&#123; current = current-&gt;next;&#125;for(current=head; current!=NULL; current=current-&gt;next)&#123;&#125;; push a new node to the front of the list 12345678910111213void Push(ListNode** headRef, int val)&#123; ListNode newNode = new ListNode(val); newNode.next = *headRef ; *headRef = newNode ; &#125;``` * changes the head pointer by a reference pointer```c++void changetoNull(ListNode** head)&#123; *head = NULL;&#125; build up a list by adding nodes at its head 1234567891011121314151617181920212223242526272829303132333435363738394041ListNode* AddatHead()&#123; ListNode *head = NULL ; for(int i=1; i&lt;5; i++)&#123; Push(&amp;head, i); &#125; return head;&#125;``` which gives a list &#123;4, 3, 2, 1&#125; * build up a list by appending nodes to the tail```c++ListNode* BuildwithSpecialCase()&#123; ListNode* head = NULL ; ListNode* tail ; Push(&amp;head, 1); tail = head ; for(int i=2; i&lt;5; i++)&#123; Push(&amp;(tail-&gt;next), i); tail = tail-&gt;next; &#125; return head; &#125;``` which gives a list &#123;1, 2, 3, 4&#125;* build up a list with dummy node ```c++ListNode* BuildWithDummy()&#123; ListNode dummy = new ListNode(0); ListNode* tail = &amp;dummy ; for(int i=1; i&lt;5; i++)&#123; Push(&amp;(tail-&gt;next), i); tail = tail-&gt;next; &#125; return dummy.next ; &#125; which returns a list {1, 2, 3, 4} appendNode(), add new node at the tail 12345678910111213141516171819202122232425262728293031323334ListNode* appendNode(ListNode** headRef, int val)&#123; ListNode *current = *headRef ; ListNode newNode = new ListNode(num); if(!current)&#123; current = &amp;newNode ; &#125;else&#123; while(current-&gt;next)&#123; current = current-&gt;next ; &#125; current-&gt;next = &amp;newNode ; &#125;&#125;``` * copyList()```c++ListNode* copyList(ListNode* head)&#123; ListNode *current = head ; ListNode *newList = NULL ; ListNode *tail = NULL ; while(current)&#123; if(!newList)&#123; newList = &amp;(new ListNode(current-&gt;val)); tail = newList ; &#125;else&#123; tail-&gt;next = &amp;(new ListNode(current-&gt;val)); tail = tail-&gt;next ; &#125; &#125; return newList ; &#125; copyList() recursive 12345678ListNode* CopyList(ListNode* head)&#123; if(!head) return NULL; else&#123; ListNode *current = head ; ListNode *newList = &amp;(ListNode(current-&gt;val)); newList-&gt;next = CopyList(current-&gt;next); &#125;&#125; linked list problems further InsertNth() 12345678910void insertNth(node_ **head, int index, int data)&#123; if(index==0) Push(head, data); else&#123; node_ * cur = *head ; for(int i=0; i&lt;index-1; i++)&#123; cur = cur-&gt;next ; &#125; Push(&amp;*cur-&gt;next), data); &#125;&#125; sortedInsert() 123456789101112131415161718192021222324252627// be notified, here we using **, to use the pointer of the head, as the head node may be updated void sortedInsert(node_ **head, node* newNode)&#123; if(*head == NULL || head-&gt;val &gt;= newNode-&gt;val)&#123; newNode-&gt;next = *head; *head = newNode; &#125;else&#123; node_ *cur = *head ; while(cur-&gt;next &amp;&amp; cur-&gt;next-&gt;val &lt; newNode-&gt;val)&#123; cur = cur-&gt;next; &#125; newNode-&gt;next = cur-&gt;next ; cur-&gt;next = newNode ; &#125;&#125;//with dummy headvoid sortedInsert(node_ **head, node* newNode)&#123; node_ dummy(0); dummy.next = *head ; node_ *cur = &amp;dummy ; while(cur-&gt;next &amp;&amp; cur-&gt;next-&gt;val &lt; newNode-&gt;val)&#123; cur = cur-&gt;next; &#125; newNode-&gt;next = cur-&gt;next ; cur-&gt;next = newNode ; *head = dummy-&gt;next;&#125; insertSort() given as unsorted list, and output with an sorted list 123456789101112void InsertSort(node_ ** head)&#123; node_ *result = NULL ; node_ *cur = *head ; node_ *next ; while(cur)&#123; next = cur-&gt;next ; // ticky- note the next pointer, before we change it sortedInsert(result, cur); cur = next ; &#125; *head = result ;&#125; append() append list b to the end of list a 123456789101112void append(node_ **a, node_ **b)&#123; node_ * cur ; if( *a == NULL)&#123; *a = *b ;&#125; else&#123; cur = *a ; while(cur-&gt;next)&#123; cur = cur-&gt;next; &#125; cur-&gt;next = *b ; &#125; *b = NULL ; &#125; frontBackSplit() given a list, split it into two sublists: one for the front half, one for the back half 1234567891011121314151617181920212223242526272829303132333435363738void frontBackSplit(node_ **head, node_ **front, node_ **back)&#123; int len = length(head); node_ *cur = *head ; if(len &lt; 2)&#123; *front = *head; *back = NULL ; &#125;else&#123; int half = len/2; for(int i=0; i&lt;half-1; i++)&#123; cur = cur-&gt;next; &#125; *front = *head; *back = cur; cur = NULL ; &#125;&#125;void frontBackSplit2(node_ **head, node_ **front, node_ **back)&#123; node_ *fast, *slow ; if(*head == NULL || (*head)-&gt;next == NULL)&#123; *front = *head ; *back = NULL ; &#125;else&#123; slow = head; fast = head-&gt;next ; while(fast)&#123; fast = fast-&gt;next; if(fast)&#123; fast = fast-&gt;next; slow = slow-&gt;next; &#125; &#125; *front = *head; *back = slow-&gt;next ; slow-&gt;next = NULL ; &#125;&#125; removeDuplicates() remove duplicated node from a sorted list 123456789101112131415161718192021222324252627282930313233void removeDuplicates(node_ ** head)&#123; node_ *slow, *fast ; if(head == NULL || head-&gt;next == NULL)&#123; return ; &#125; slow = *head ; fast = (*head)-&gt;next ; while(fast)&#123; if(slow-&gt;val == fast-&gt;val)&#123; node_ * needfree = fast ; fast = fast-&gt;next ; free(needfree); &#125;else&#123; slow = slow-&gt;next; fast = fast-&gt;next; &#125; &#125;&#125;void removeDuplicate(node_ **head)&#123; node_ *cur = *head; if(cur==NULL) return ; while(cur-&gt;next)&#123; if(cur-&gt;val == cur-&gt;next-&gt;val)&#123; node_ *nextNext = cur-&gt;next-&gt;next ; free(cur-&gt;next); cur-&gt;next = nextNext; &#125;else&#123; cur = cur-&gt;next; &#125; &#125;&#125; moveNode() given two list, remove the front node from the second list, and push it onto the front of the first. // a = {1, 2, 3}; b = {1, 2, 3} =&gt; a={1, 1, 2, 3}, b={2, 3} 123456void moveNode(node_ **dest, node_ **source)&#123; node_ *newNode = *source ; *source = newNode-&gt;next ; newNode-&gt;next = *dest ; *dest = newNode ; &#125; alternatingSplit() given a list, split its nodes into two shorter lists. if we number the elements 0, 1, 2, …, then all the even elements go to the first sublist, and all the odd elements go to tthe second 12345678910111213void alternatingSplit(node_ *source, node_ **ahead, node_ **bhead)&#123; node_ *a = NULL ; node_ *b = NULL ; node_ *cur = *source ; while(cur)&#123; moveNode(&amp;a, &amp;cur); if(cur)&#123; moveNode(&amp;b, &amp;cur); &#125; &#125; *ahead = a ; *bhead = b;&#125; shuffleMerge() given two list, merge their nodes together to make one list, takign nodes alternatively between the two lists 123456789101112131415161718192021222324252627282930313233343536373839404142434445464748node_* shufflMerge(node_ *a, node_ *b)&#123; node_ *res = NULL ; int i=0; while(a || b)&#123; if(i % 2 == 0 &amp;&amp; a)&#123; moveNode(&amp;res, &amp;a); &#125;else if(b)&#123; moveNode(&amp;res, &amp;b); &#125; &#125;&#125; //but this gives the tail to front ordernode_* shufflMerge2(node_ *a, node_ *b)&#123; node_ dummy(0); node *tail = &amp;dummy ; while(1)&#123; if(a==NULL)&#123; tail-&gt;next = b; break ; &#125;else if(b == NULL)&#123; tail-&gt;next = a; break; &#125;else&#123; tail-&gt;next = a ; tail = a ; a = a-&gt;next ; tail-&gt;next = b; tail = b; b = b-&gt;next; &#125; &#125; return dummy.next ;&#125;// recursive ?node_* shufflMerge3(node_ *a, node_ *b)&#123; node_ *res ; node_ *recur ; if(a==NULL) return b ; else if(b=NULL) return a ; else&#123; recur = shuffleMerge3(a-&gt;next, b-&gt;next); res = a ; a-&gt;next = b ; b-&gt;next = recur ; return res; &#125;&#125; sortedMerge() given two sorted in incresing order listes, merge into one in increasing order 1234567891011121314151617181920212223242526272829303132333435363738node_ *sortedMerge(node_ *a, node_ *b)&#123; node_ dummy(0); node_ *tail = &amp;dummy ; dummy.next = NULL ; while(1)&#123; if(a==NULL)&#123; tail-&gt;next = b ; break; &#125;else if(b==NULL)&#123; tail-&gt;next = a; break ; &#125; if(a-&gt;val &lt;= b-&gt;val)&#123; moveNode(&amp;(tail-&gt;next), &amp;a); &#125;else&#123; moveNode(&amp;(tail-&gt;next), &amp;b); &#125; tail = tail-&gt;next; &#125; return dummy.next ;&#125;// how this works?node_ *sortedMerge2(node_ *a, node_ *b)&#123; node_ *result = NULL ; if(a==NULL) return b; if(b==NULL) return a; if(a-&gt;val &lt;= b-&gt;val)&#123; result = a; result-&gt;next = sortedMerge2(a-&gt;next, b); &#125;else&#123; result = b; result-&gt;next = sortedMerge2(a, b-&gt;next); &#125; return result;&#125; mergeSort() 12345678910111213void mergeSor(node_ ** headRef)&#123; node_ *head = *headRef ; node_ *a, *b; if( (head==NULL) || (head-&gt;next == NULL))&#123; return ; &#125; frontBacksplit(head, &amp;a, &amp;b); mergeSort(&amp;a); mergeSort(&amp;b); *headRef = sortedMerge(a,b):&#125; reverse() 123456789101112131415void reverse(node_ **head)&#123; node_ *res = NULL; node_ *cur = *head ; node_ *next ; while(cur)&#123; next = cur-&gt;next; cur-&gt;next = res ; res = cur ; cur = next ; &#125; *head = res;&#125; recursive reverse() // concerned 1234567891011121314void recursiveReverse(node_ **head)&#123; ndoe_ *first, *rest ; if(*head == NULL) return ; first = *head ; rest = first-&gt;next ; if(rest == NULL) return ; recursiveReverse(&amp;rest); first-&gt;next-&gt;next = first ; first-&gt;next = NULL ; *head = rest ;&#125; referencelinked list problems linked list basics from standford cs]]></content>
      <tags>
        <tag>leetCode</tag>
      </tags>
  </entry>
  <entry>
    <title><![CDATA[hypervisor and gpu virtualization]]></title>
    <url>%2F2020%2F03%2F14%2Fhypervisor-and-gpu-virtualization%2F</url>
    <content type="text"><![CDATA[backgroundto build a private data center for ads training and SIL. a few concepts there: linux remote desktopLinux GUI is based on x-protocol, which is kind of CPU based, during deploying gpu-based app in docker swarm, I have studied xserver, ssh -X channels give the way for visualization desktop and simple apps(e.g. glxgears) among xserver and xclient. while for really gpu-densive app, e.g. game rendering, machine learning, ssh -X can’t really use gpu resource well, which needs further support from device management mechanism from docker or k8s. virutal machine/ cloud desktopVM is based on hypervisor, known as a virtual machine monitor(VMM), a type of virtualization software that supports the creation and management of VMs, hypervisor translate requests between the physical and virtual resources, making virtualization possible. A hypervisor allows one host computer to support multiple guest VMs by virtually sharing its resources, like memory and processing. Generally, there are two types of hypervisors. Type 1 hypervisors, called “bare metal,” run directly on the host’s hardware. Type 2 hypervisors, called “hosted,” run as a software layer on an operating system, like other computer programs, the most common e.g. vmware, citrix. When a hypervisor is installed directly on the hardware of a physical machine, between the hardware and the operating system (OS), it is called a bare metal hypervisor, which separates the OS from the underlying hardware, the software no longer relies on or is limited to specific hardware devices or drivers. VDIa type of vm application is virtual desktop inftrastructure(vdi), VDI hosts desktop environments on a centralized server and deploys them to end-users on request. this process also known as server virtualization VDI is not necessarily requires a physical gpu, without a gpu, we can still run vmware, but the performance is poor. but for many other VM usage, beyond VDI, we still need ways to access GPU devices in VM. GPU techs in VMvwmare has an introduction about gpu tech in vm: software 3D. using software to mimic GPU calculating. all 3d calculating is done by CPU vsga(vitual shared graphics acceleration), each virtual desktop has a vsga driver, which has a ESXi module, inside which has a GPU driver, which can call the physical gpu, but this shared mode is through Hypervisor vdga(virtual dedicated graphics accleration), or pass through mode, the physical GPU is assigned to a special virtual desktop for one user only. vgpu, split physical GPU to a few virtual-GPU, each virtual desk can have one vGPU. VM vs dockerdocker is a light-weight virtualization way, which don’t need hypervisor, and the app in docker is access the host physical devices directly, making docker looks like a process, rather than a virtual OS; and scientfic computing apps run in VM is actually using the virtual CPU from hypervisor, which bypass the cpu optimzation used for math calcualting. so usually a physical machine can start hundres or thousands docker, but can only run a few VM. referlinux remote desktop understand vdi gpu tech nvidia vgpu tech in vSphere GPU hypervisors on OpenStack containers vs hypervisors]]></content>
  </entry>
  <entry>
    <title><![CDATA[pure c/c++ pointer]]></title>
    <url>%2F2020%2F03%2F13%2Fpure-c-c-pointer%2F</url>
    <content type="text"><![CDATA[pointer assignment12345678910111213141516171819202122232425262728293031323334int main()&#123; int a = 10 ; int *pa = &amp;a ; cout &lt;&lt; "pa add " &lt;&lt; &amp;pa &lt;&lt; "pa val " &lt;&lt; *pa &lt;&lt; endl ; int b = 22; int *pb = &amp;b ; cout &lt;&lt; "pb add " &lt;&lt; &amp;pb &lt;&lt; "pb val " &lt;&lt; *pb &lt;&lt; endl ; int d = 100 ; int *pd = &amp;d ; cout &lt;&lt; "pd add " &lt;&lt; &amp;pd &lt;&lt; "pd val " &lt;&lt; *pd &lt;&lt; endl ; int *pc = pb ; //pointer assignment cout &lt;&lt; "pc add " &lt;&lt; &amp;pc &lt;&lt; "pc val " &lt;&lt; *pc &lt;&lt; endl ; pb = pd ; cout &lt;&lt; "pb add " &lt;&lt; &amp;pb &lt;&lt; "pb val " &lt;&lt; *pb &lt;&lt; endl ; cout &lt;&lt; "pd add " &lt;&lt; &amp;pd &lt;&lt; "pd val " &lt;&lt; *pd &lt;&lt; endl ; pd = pa ; cout &lt;&lt; "pd add " &lt;&lt; &amp;pd &lt;&lt; "pd val " &lt;&lt; *pd &lt;&lt; endl ; pa = pc ; cout &lt;&lt; "pa add " &lt;&lt; &amp;pa &lt;&lt; "pa val " &lt;&lt; *pa &lt;&lt; endl ; cout &lt;&lt; "pc add " &lt;&lt; &amp;pc &lt;&lt; "pc val " &lt;&lt; *pc &lt;&lt; endl ; return 0;&#125; /* pb add 0x7fff5e744b30pb val 22pd add 0x7fff5e744b28pd val 100pc add 0x7fff5e744b20pc val 22pb add 0x7fff5e744b30pb val 100pd add 0x7fff5e744b28pd val 100pd add 0x7fff5e744b28pd val 10pa add 0x7fff5e744b40pa val 22pc add 0x7fff5e744b20pc val 22*/ pointer assigment used a lot in linked list problems, the sample above is a pointer solution for linked list reverse. consider pointer as a container with an address to another object. pointer assignment, e.g. pointerA = pointerB only changes the content in the container. but the address of the container itself doesn’t change. and with *(dereference) the pointer, we can see the content is changed. further, taken pc, pb, pd as another example.12pc = pb ;pb = pd; the first line will make container pc to store what is stored in container pb, in another word, the first line will make pc point to the address, which is stored in pb. and the second line will then put what’s stored in container pd to container pb. after this two updates, pc points to the original content in pb; pb and pd points to the same content. obviously, what’s inside pc now, has nothing to do with pointer pb. pointer++ forwardthe basic scenario is as following, will p2 move forward as well ? 123int *p1 = &amp;int_var ;int *p2 = p1; p1++ ; we can see from the following test.c, p2 won’t move forward as p1++. 1234567891011121314151617181920212223242526int main()&#123; int a[4] = &#123;1, 2, 3, 4 &#125; ; cout &lt;&lt; " a addr " &lt;&lt; &amp;a &lt;&lt; " a val " &lt;&lt; *a &lt;&lt; endl ; int *p = a ; int *q = p ; cout &lt;&lt; " p addr " &lt;&lt; &amp;p &lt;&lt; " p val " &lt;&lt; *p &lt;&lt; endl ; cout &lt;&lt; " q addr " &lt;&lt; &amp;q &lt;&lt; " q val " &lt;&lt; *q &lt;&lt; endl ; for(int i=0; i&lt;3; i++)&#123; q++; &#125; cout &lt;&lt; " a addr " &lt;&lt; &amp;a &lt;&lt; " a val " &lt;&lt; *a &lt;&lt; endl ; cout &lt;&lt; " p addr " &lt;&lt; &amp;p &lt;&lt; " p val " &lt;&lt; *p &lt;&lt; endl ; cout &lt;&lt; " q addr " &lt;&lt; &amp;q &lt;&lt; " q val " &lt;&lt; *q &lt;&lt; endl ; return 0;&#125; /* a addr 0x7fff5e968bb0 a val 1 p addr 0x7fff5e968b40 p val 1 q addr 0x7fff5e968b38 q val 1 a addr 0x7fff5e968bb0 a val 1 p addr 0x7fff5e968b40 p val 1 q addr 0x7fff5e968b38 q val 4*/ pointer++ can be reviewed as a same type pointer in current pointer’s neighbor, then do a pointer assigment: 12345678910111213141516171819int* tmp_p = 0x7fff5e744b30 ; int* p1 = 0x7fff5e744b28 ; p1 = tmp_p ;``` ### int++```c++int main()&#123; vector&lt;int&gt; nums ; for(int i=0; i&lt;5; i++)&#123; nums.push_back(i); &#125; int p=0; std::cout &lt;&lt; "nums[p++] " &lt;&lt; nums[p++] &lt;&lt; " p val " &lt;&lt; p &lt;&lt; endl ; return 0;&#125; /* nums[p++]: 0 p val: 1 */]]></content>
      <tags>
        <tag>pure C</tag>
      </tags>
  </entry>
  <entry>
    <title><![CDATA[nodejs introduction]]></title>
    <url>%2F2020%2F03%2F12%2Fnodejs-introduction%2F</url>
    <content type="text"><![CDATA[callback / asnyc123456var fs = require("fs") ;fs.readFile("input.txt", function(err, data)&#123; if(err) return console.error(err); console.log(data.toString());&#125;) the async fun take the callback fun as its last parameter. event loop1234567891011var events = require('events')var emitter = new events.EventEmitter();var connectH = function connected()&#123; console.log("connected") emitter.emit('data_received'); // trig 'data_received'&#125;emitter.on('connection', connectH);emitter.on('data_received', function()&#123; console.log('data received');&#125;)emitter.emit('connection'); // trig 'connection' event emitterwhen async IO is done, will send an event to the Event queue. e.g. when fs.readStream() open a file will trig an event. e.t.c 123addListener(event, listener)on(event, listener) #listeningemit(evnet, [arg1], ...) #trig file system1234567891011121314var fs = require("fs")fs.open("input.log", "r+", function(err, fd)&#123;&#125;);fs.state("input.log", function(err, stats_info)&#123;&#125;);fs.readFile("input.log", function(err, data)&#123; if(err)&#123; return console.error(err); &#125; console.log(data.toString());&#125;);fs.writeFile("output.log", function(err)&#123; if(err)&#123; return console.error(err);&#125; console.log("write successfully")&#125;);fs.read(fd,buffer, [args..]) #read binary stream bufferas js language has only txt bytes data, to deal with binary data, introduce Buffer 12345Buffer.alloc(size)Buffer.from(buffer||array||string)Buffer.write() #write to bufferbuffer.toString() #read from bufferbuffer.toJSON() stream123456var fs=require("fs");var readerStream = fs.createReadstream("input.file")readerStream.on('data', function(chunk, err)&#123;&#125;)var writeStream = fs.createWriteStream("output.file")writeStream.on('finished', function()&#123;&#125;)readerStream.pipe(writeStream); #pipe from a reader stream to a writer stream module systemto enable different nodejs files can use each other, there is a module system, the module can be a nodejs file, or JSON, or compiled C/C++ code. nodejs has exports and require used to export modules’ APIs to external usage, or access external APIs. 12module.exports = function()&#123;&#125;exports.method = function()&#123;&#125; the first way export the object itself, the second way only export the certain method. Global Object12console.log()console.error() common modules path 123var path = require("path");path.join("/user/", "test1"); path.dirname(p_); http server 1234567891011121314var http = require("http");http.createServer(function(request, response)&#123; var url_path = request.url ; server(url_path, function(err, data)&#123; if(err)&#123; console.log(err); response.writeHead(404, "xx"); &#125;else&#123; response.writeHeead(200, "yy"); response.write(data.toString()); &#125; response.end(); &#125;);&#125;).listen(8080); http client 123456789101112var http = require('http')url = "http://localhost:8080/index.html"var req = http.request(url, callback);var callback = function(response)&#123; var body = '' ; response.on('data', function(data)&#123; body += data; &#125;); response.on('end', function()&#123; console.log(body); &#125;);&#125;; ExpressExpress has requst and response object to handle request and reponse. express.static can handle static resources, e.g. image, css e.t.c 12345678910111213141516171819202122232425262728293031323334var express = require("repress");var app = express();app.use('/public', express.static('public'));app.get('/index', function(req, res)&#123;&#125;)app.get('/user', function(req, res)&#123; var response = &#123; "name": req.query.name, "id": req.query.id &#125;; res.send(JSON.stringify(response));&#125;);app.post('/user', function(req, res)&#123; var response = &#123; "name" : req.body.name ; "id" : req.body.id &#125;; res.send(JSON.stringify(response));&#125;); app.post('file_upload', function(req, res)&#123; var des_file = __dirname + "/" + req.files[0].originalname ; fs.readFile(req.files[0].path, function(err, data)&#123; fs.writeFile(des_file, data, function(err)&#123; if(err)&#123;console.error(err);&#125; else&#123; var response = &#123; message: "file uploaded successfully" , filename: req.files[0].originalname &#125;; &#125; res.send(JSON.stringify(response)); &#125;); &#125;);&#125;)var server = app.listen(8080, function()&#123;&#125;) res is what send from server to client, for both /get, /post methods. req object represents the HTTP request and has properties for the request query string, parameters, body, HTTP headers e.t.c req.body contains key-value pairs of data submitted in the request body. by default, it’s undefined, and is populated when using body-parsing middleware. e.g. body-parser req.cookies when using cookie-parser middleware, this property is an object that contains cookies send by the request req.path contains the path part of the request url req.query an object containing a property for each query string parameter in the route req.route the current mathced route, a string data access object(DAO)Dao pattern is used to separate low level data accessing API or operations form high level business services. usually there are three parts: DAO interface, which defines the standard operations to be performed on a model object DAO class, the class that implement DAO interfaces, this class is responsible to get data from database, or other storage mechanism model object, a simple POJO containing get/set methods to store data retrieved using DAO class o/r mapping (orm) is used a lot to map database itme to a special class, and it’s easy to use, but a little drawback of orm is it assume the database is normalized well. DAO is a middleware to do directly SQL mapping, who mapes SQL query language to the output class. separting models, logic and daos routes.js, where to put routes, usually referenced as controllers models.js, where to put functions talk to database, usually referenced as dao layer views.js these three components can put under app; all static data usually put under public folder; the Express package.json and index.js are at the same level as app. refernodejs at runoob.com nodejs &amp; mysql bearcat-dao introduction koa chokidar]]></content>
      <tags>
        <tag>nodejs</tag>
      </tags>
  </entry>
  <entry>
    <title><![CDATA[high concurrent ws log server]]></title>
    <url>%2F2020%2F03%2F11%2Fhigh-concurrent-ws-log-server%2F</url>
    <content type="text"><![CDATA[backgroudpreviously, we designed a logger module from ws to db/, this blog is one real implementation of the python solution. high concurrent ws server onlinesingle server with multi clients: a simple C++ the server is started first, and wait for the incoming client calls, periodically report its status: how many clients keep connected. meanwhile, once an incoming call is detected and accepted, the server will create a separate thread to handle this client. therefore, the server creates as many separate sessions as there are incoming clients. how handle multiple clients? once an incoming client call is received and accepted, the main server thread will create a new thread, and pass the client connection to this thread what if client threads need access some global variables? a semaphore instance ish helpful. how ws server handles multiple incomming connections socket object is often thought to represent a connection, but not entirely true, since they can be active or passive. a socket object in passive/listen mode is created by listen() for incoming connection requets. by definition, such a socket is not a connection, it just listen for conenction requets. accept() doesn’t change the state of the passive socket created by listen() previously, but it returns an active/connected socket, which represents a real conenction. after accept() has returned the connected socket object, it can be called again on the passive socket, and again and again. or known as accept loop But call accept() takes time, can’t it miss incoming conenction requests? it won’t, there is a queue of pending connection requests, it is handled automatically by TCP/IP stack of the OS. meaning, while accept() can only deal with incoming connection request one-by-one, no incoming request will be missed even when they are incoming at a high rate. python env setupwebsockets module requires python3.6, the python version in bare OS is 3.5, which gives: 12345678910 File "/usr/lib/python3/dist-packages/websockets/compatibility.py", line 8 asyncio_ensure_future = asyncio.async # Python &lt; 3.5 ^SyntaxError: invalid syntaxß``` basically, [asyncio.async](https://stackoverflow.com/questions/51292523/why-does-asyncio-ensure-future-asyncio-async-raise-a-syntaxerror-invalid-synt) and another depend module [discord.py](https://github.com/Rapptz/discord.py/issues/1396), require python &lt; 3.5v, then gives another error:```pythonTypeError: unsupported operand type(s) for -=: 'Retry' and 'int' the Retry error fixed by adding the following lines to ~/.pip/pip.conf 12345678910111213141516[global]index-url = https://pypi.tuna.tsinghua.edu.cn/simple``` in a bare system with pre-installed python3.5, I did the following steps: ```pythonsudo apt install software-properties-commonsudo add-apt-repository ppa:deadsnakes/ppasudo apt-get install python3.7 sudo update-alternatives --install /usr/bin/python3 python3 /usr/bin/python3.7 2sudo apt-get install python3-websocketssudo apt-get install python3-websocketsudo apt-get install python3-sqlalchemysudo pip3 install threadpoolsudo apt-get remove --purge python3-websocket #based on python3.5sudo apt-get install python3-websocket #based on python3.7 which gives another error: 123456/var/lib/dpkg/info/python3-websocket.postinst: 6: /var/lib/dpkg/info/python3-websocket.postinst: py3compile: not founddpkg: error processing package python3-websocket (--configure): subprocess installed post-installation script returned error exit status 127Errors were encountered while processing: python3-websocketE: Sub-process /usr/bin/dpkg returned an error code (1) which then be fixed by go to /var/lib/dpkg/info/ and delete all python3-websocket.* files: 123sudo rm /var/lib/dpkg/info/[package_name].*sudo dpkg --configure -asudo apt-get update everything looks good, but still report: 1ModuleNotFoundError: No module named &apos;websockets&apos; Gave up setting up with the bare python, then create a new conda env, and ran the following settings inside, clean and simple: 12345pip install websocketspip install websocket-client #rather websocketpip install threadpoolpip install sqlalchemypip install psycopg2 during remote test, if ws server down unexpected, need kill the ws pid: 12345678910111213141516171819202122232425262728293031323334353637383940sudo netstat -tlnp #find the special running port and its pid kill $pid``` or `kill $(lsof -t -i:$port)`## websockets &amp; websocket-clientthe long-lived connection sample from [ws-client](https://github.com/websocket-client/websocket-client) is good base, as here need to test a high concurrent clients, we add `threadpool`:```python def on_open(ws, num): def run(*args): ##args for i in range(3): time.sleep(1) message = "your_message" ws.send(json.dumps(message)) time.sleep(1) ws.close() print("thread terminating...") thread.start_new_thread(run, ()) def on_start(num): websocket.enableTrace(True) ws = websocket.WebSocketApp("ws://localhost:8888/", on_message = on_message, on_error = on_error, on_close = on_close) ws.on_open = on_open(ws, num) ws.run_forever() def threadpool_test(): start_time = time.time() pool = ThreadPool(100) test = list() for itr in range(100): test.append(itr) requests = makeRequests(on_start, test) [pool.putRequest(req) for req in requests] pool.wait() print('%d second'% (time.time() - start_time)) in ws-client src, we see: on_open: callable object which is called at opening websocket. this function has one argument. The argument is this class object. but all customized callback func can add more arguments, which is helpful. on_message: callable object which is called when received data. on_message has 2 arguments. The 1st argument is this class object. the 2nd argument is utf-8 string which we get from the server. we can implement a simple sqlalchemy orm db-writer, and add to the ws-server: async def process(self, websocket, path): raw_ = await websocket.recv() jdata = json.loads(raw_) orm_obj = orm_(jdata) try: self.dbwriter_.write(orm_obj) print(jdata, "write to db successfully") except Exception as e: dbwriter_.rollback() print(e) greeting = "hello from server" await websocket.send(greeting) print(f"&gt; {greeting}") def run(self): if self.host and self.port : start_server = websockets.serve(self.process, self.host, self.port) else: start_server = websockets.serve(self.process, "localhost", 8867) asyncio.get_event_loop().run_until_complete(start_server) asyncio.get_event_loop().run_forever() in summaryin reality, each ws-client is integrated to one upper application, which generate messages/log, and send to ws-server, inside which write to db, due to asyncio, the performance is good so far. in future, we maybe need some buffer at ws-server. refera simple multi-client ws server create a simple python sw server using Tornado python: websockets python: json python: threadpool]]></content>
      <tags>
        <tag>websocket</tag>
      </tags>
  </entry>
  <entry>
    <title><![CDATA[design a logger from ws to db]]></title>
    <url>%2F2020%2F03%2F09%2Fdesign-a-logger-from-ws-to-db%2F</url>
    <content type="text"><![CDATA[backgroundfor now, we had design a file server from db to s3 and mf reader to feed data to the ads test module, which run in SIMD mode, e.g. k8s. for a closed-loop SIL, another component is how to record test message/outputs to database. I am thinking firstly event driven webframework python, or plusar, which is a popular python event-driven project. as lgsvl pythonAPI is really a good choice to transfer message among server and clients. so the basic idea is to integrate websocket/tcp channel inside aeb model websocket vs tcpmost ADAS models developed in OEM are matlab based, and the most common communication protocol in Matlab is UDP, a little review about tcp vs udp, I personally don’t like udp, no reason. so narrow to TCP, the further question is: should it be a raw TCP or Websocket), websocket enables a stream of messages instead of a stream of bytes. WebSockets is built on normal TCP sockets and uses frame headers that contains the size of each frame and indicate which frames are part of a message. The WebSocket API re-assembles the TCP chunks of data into frames which are assembled into messages before invoking the message event handler once per message. WebSocket is basically an application protocol (with reference to the ISO/OSI network stack), message-oriented, which makes use of TCP as transport layer. btw, TCP is bidirectional because it can send data in both directions, and it is full-duplex because it can do that simultaneously, without requiring line turnarounds, at the API level for ADAS test outputs, it’s plain txt, I’d prefer websocket. so make the decision. matlab websocket clientthere is an github project: matlab websocket, which is enough to implement a simple websocket in Matlab. discussion at stackoverflow basically, here we bind a weboscket client to the adas model, in matlab. which is a good choice, so when runnign adas model in massively, the whole system looks like multi ws clients concurrently communicate with one ws server, which is well separated from adas model, and can freely implmented in either nodejs, python or c# e.t.c., totally friendly to other third-party data analysis tools in down-stream. multi-write db concurrentlyonce we get the message from ws clients, we need transfer all of them to a database, e.g. pySQL. but the first thing need to understand is whether SQL can concurrent inserts. if there are multiple INSERT statements, they are queued and performed in sequence, concurrently with the SELECT statements. SQL doesn’t support parallel data inserts into the same table, with InnoDB or MyISAM storage engine. this also answered another question, why not directly adas model write to db, since the parallel runinng cases increase, writing DB will be the bottleneck. so has the websocket middle level, is a good choice. nodejs solutionnodejs is really powerful to implement the solution above. basically, a ws server and sql writer: mysqljs python solutionsimilar as nodejs, python has matured websocket module and ORM moduel to write db. referliaoxuefeng tcp zhihu mysql lock nodejs connect mysql]]></content>
  </entry>
  <entry>
    <title><![CDATA[asam mdf reader]]></title>
    <url>%2F2020%2F03%2F07%2Fasam-mdf-reader%2F</url>
    <content type="text"><![CDATA[previously reviewed a few open source mdf readers as well as get basic concept about mdf. here is a working process with asammdf, which is step by step to understand this lib. the basic problem here is to read a special signal data, not know its channel and group id. first try1234567891011121314151617reader = MDF4(file_name)data_dict = reader.info()print("the total len of dict: ", len(data_dict))print(data_dict.keys())block_ = data_dict['group 117']channel_metadata = reader.get_channel_metadata('t', 117)channel117_comment = reader.get_channel_comment('t', 117)channel117_name = reader.get_channel_name(117, 0)pattern="channel 123"for key in block_ : r1 = re.search(pattern, str(block_[key]), re.M|re.I) if r1: print(key)data_ = reader.get("channel 123") #doesn't work, as it reports can't find Channeldata_ = reader.get("fus_lidar_camera", "group 123") #doesn't work, as it reports can't find the Groupdata_ = reader.get("fus_lidar_camera") #works so first, “channel 123” is actually not the name. and with the right name can reading data, but no guarantee which group this channel is from, and further can’t know is there additional groups has record this type of data. second tryas mentioned mdf4 groups have attributes, how about to access these attributes. so far, especially helpful attributes are: groups:list, list of data group dicts channels_db:dict, used for fast channel access by name. for each name key the value is a list of (group index, channel index) tuples 123456789channels = reader.groups[2] #with the same order as found in mdf, even with index 0, 1, ...print("type of channels: ", type(channels)) #dictif "fus.ObjData_Qm_TC.FusObj.i17.alpPiYawAngle" in channels: print(channels["fus.ObjData_Qm_TC.FusObj.i17.alpPiYawAngle"].id) #noneprint(" chanels type: " , type(channels["channels"])) #list print(" chanels len: " , len(channels["channels"])) #577print("chanel_group type: " , type(channels["channel_group"])) # ChannelGroup objectprint("chanel_group len: " , len(channels["channel_group"])) #17print(" data_group type: " , type(channels["data_group"])) #DataGroup objectprint(" data_group len: " , len(channels["data_group"])) #10 which is far more less as expected, but give an good exploring about the reader’s attributes. third try12345678910111213141516171819reader = MDF4(file_name)cdb = reader.channels_db#print("size of c_db ", len(cdb))#print("type of cdb ", type(cdb))#for more sigles interested, please append to this listsig_list = ['sensor_1', 'sensor_2', 'sensor_3']for sig in sig_list: if sig in cdb: print(cdb[sig]) for item in cdb[sig]: (gid, cid) = item alp = reader.get(sig, gid, cid) print("type of current sig ", type(alp)) print("size of current sig ", len(alp)) #alp is the raw sig data, here is only test print else: print(sig, " is not in cdb" which gives me the right way to access the interested signals’ raw data. combing with s3 streaming body read, as mentioned in previous mdf-reader, which finally gives the powerful pipeline from reading mdf4 files from s3 storage to downside appilication referpyton reg python substring]]></content>
  </entry>
  <entry>
    <title><![CDATA[design a nodejs web server from db to s3]]></title>
    <url>%2F2020%2F03%2F05%2Fdesign-a-nodejs-web-server-from-db-to-s3%2F</url>
    <content type="text"><![CDATA[backgroundduring ADS road test, there are Tb~Pb mount of data generated, e.g. sensor raw data(rosbag, mf4), images, point cloud e.t.c. previous few blogs are focusing on data storage. for a more robost and user-friendly data/file server, also need consider database and UI. from the end-user/engineers viewpoint, a few basic functions are required: query certain features and view the data/files filtered download a single/a batch of interested files (for further usage or analysis) upload large mount of files quickly to storage (mostly by admin users) a FTP server to support s3traditionally, for many large size files to download, FTP is common used. the prons and corns comparing fttp and ftp for transfering files: HTTP is more responsive for request-response of small files, but FTP may be better for large files if tuned properly. but nowadays most prefer HTTP. doing search a little more, there are a lot discussions to connect amazon s3 to ftp server: transfer files from s3 storage to ftp server FTP server using s3 as storage using S3 as storage for attachments in a web-mail system FTP/SFTP access to amazon s3 bucket and there are popular ftp clients which support s3, e.g. winSCP, cyberduck, of course, aws has it own sftp client, as well as aws s3 browser windows client), more client tools check here however, ftp can’t do metadata query. for some cases, e.g. resimulation of all stored scenarios, which makes no difference for each scenario, we can grab one by one and send it to resmiluator; but for many other cases, we need a certain pattern of data, rather than reading the whole storage, then a sql filter is much efficient and helpful. so a simple FTP is not enough in these cases. s3 objects/files to dbstarting from a common bs framework, e.g. react-nodejs, and nodejs can talk to db as well. ** nodejs query buckets/object header info from s3 server, and update these metadata into db. there is a great disscussion about storing images in db - yea or nay: when manage many TB of images/mdf files, storing file paths in db is the best solution: db storage is more expensive than file system storge you can super-acc file system access: e.g. os sendfile() system call to asynchronously send a file directly from fs to network interface, sql can’t web server need no special coding to access images in fs db win out where transactional integrity between image/file and its metadata are important, since it’s more complex to manage integrity between db metdata to fs data; and it’s difficult to guarantee data has been flushed to disk in the fs so for this file server, the metadata include file-path-in-s3, and other user interested items. 12345file_id feature file_path 1 1 http://s3.aws.amazon.com/my-bucket/item1/img1a.jpg 2 2 http://s3.aws.amazon.com/my-bucket/item1/img1b.jpg 3 3 http://s3.aws.amazon.com/my-bucket/item2/img2a.jpg 4 4 http://s3.aws.amazon.com/my-bucket/item2/img2b.jpg ** during browser user query/list request, nodejs talk to db, which is a normal bs case. ** when the browser user want to download a certain file, then nodejs parse the file metadata and talk to s3 nodejs to s3nodejs fs.readFile()taking an example from official nodejs fs doc: 123456789101112const fs = require('fs')fs.readFileSync(pathname, function(err, data)&#123; if(err)&#123; res.statusCode = 500 ; res.end(`Err getting file: $[err&#125;`) &#125;else&#123; res.end(data); &#125;&#125;); const fileUrl = new URL('file://tmp/mdf')fs.readFileSync(fileUrl); if not file directly, maybe fs.Readstream class is another good choice to read s3 streaming object. fs.readFile() and fs.readFileSync() both read full content of the file in memory before returning the data. which means, the big files are going to have a major impact on your memory consumption adn speed of execution of the program. another choice is fs-readfile-promise. express res.downloadres object represent the HTTP response that an Express app sends when it gets an HTTP request. expressjs res.download 12345678res.downlaod('/ads/basic.mf4', 'as_basic.mf4', function(err)&#123; if(err)&#123; log(`download file error: $&#123;err&#125;`) &#125;else&#123; log(`download file successfully`) &#125;&#125;) aws sdk for nodejstaking an example from aws sdk for js 12345678910var aws = require('aws-sdk')var s3 = new aws.S3()s3.createBucket(&#123;Bucket: your_bucket_name, function()&#123; var params=&#123;Bucket: your_bucket_name, Key: your_key, Body: mf4.streaming&#125;; s3.putObject(params, function(err, data)&#123; if(err) console.log(err) else console.log("upload data to s3 succesffully") &#125;);&#125;); check aws sdk for nodejs api for more details. in summaryeither a FTP server or a nodejs server, it depends on the upper usage cases. a single large-size(&gt;100mb) file(e.g. mf4, rosbag) download, nodejs with db is ok, as db helps to filter out the file first, and a few miniutes download is needed many of little-size(~1mb) files(e.g. image, json) downlaod, nodejs is strong without doubt. many of large-size files download/upload, a friendly UI is not necessary, comparing to the performance, then FTP may be the solution.]]></content>
      <tags>
        <tag>aws</tag>
        <tag>web server</tag>
      </tags>
  </entry>
  <entry>
    <title><![CDATA[driving scenarios on-going]]></title>
    <url>%2F2020%2F03%2F04%2Fdriving-scenarios-on-going%2F</url>
    <content type="text"><![CDATA[Euro NCAP driving scenariosAdaptive Cruise Control(ACC)ACC is tested in an extended version of AEB. these systems are designed to automatically adapt the speed when approaching a slower moving vehicle or noe that is braking. notification, not all systems perform well with stationary obstacles. a slow moving vehicle test a stationary obstacle(e.g. vehicle) test autonomous emergency braking(AEB)AEB pedestrain an adult pedestrain running from the driver’s side of the vehicle an adult walk from the passenger’s side a child runs from between parked cars on the passenger’s side of the car pedestrain walking in the same direction as the vehicle algnedwith the centre of vehicle pedestrain walking in the same direction as the vehicle offset to one side AEB cyclist cyclist is crossing the vehicle’s path cyclist is travelling in the same direction Euro NCAP gives greatest reward if a collision is completely avoided; in some case, if AEB is unable to stop the vehicle completely, still give some points. lane keeping assist(LKA) lane centering, a good ADS should continue to support the driver during the manoeuvre and will not resist the driver or deactivate. cut-in test, a npc from adjacent lane merges into the current lane cut-off test, a npc leaves the current lane abruptly to avoid a stopped vehicle ahead swerve around a small obstacle test speed assistance test, to use map data and/or data from sensors to identify the local speed limit NHTSA driving scenariosa previous blog design scenarios in ADS simulation is based on NHTSA. forward collision prevention Forward collision warning AEB pedestrain AEB Adaptive lighting backing up &amp; parking rear automatic braking backup camera rear cross traffic alert lane &amp; side assist lane departure warning lane keeping assist blind spot detection lane centering assist maintaing safe distance traffic jam assist highway pilot ACC Matlab Driving Scenario samplesAEB (for vulnerable road users) AEB_Bicyclist_Longitudinal_25width, at collision time, the bicycle is 25% of the way across the width of the ego vehicle. AEB_CCRb_2_initialGap_12m, a car-to-car rear braking(CCRb) scenario where the ego rear hit a braking agent. the deceleration is 2 m/s^2 with initial gap 12m AEB_CCRm_50overlap, a car-to-car rear moving(CCRm) scenario, where the ego rear hits a moving vehicle. at collisio ntime, the ego overlaps with 50% of the width of the moving vehicle AEB_CCRs_-75overlap, a car-to-car stationary(CCRs) sceanrio, where the ego rear hits a stationary vehicle. at collision time, the ego overlaps with 75% of the width of the stationary vehicle. when the ego is to the left of the other vehicle, the percent overlap is negative AEB_Pedestrain_Farside_50width, the ego collides with a pedestrain who is traveling from the left side(far side, assuming vehicle drive on right side of the road). at collision time, the pedestrain is 50% of the way across the width of the ego AEB_PedestrainChild_Nearside_50width, the ego collides with a pedestrain who is traveling from the right side(near side), at collision time, the pedestrain is 50% of the way across the width of the ego Emergency Lane Keeping(ELK) ELK_FasterOvertakingVeh_Intent_Vlat_0.5, ego intentionally changes lanes but collides with a faster overtaking vehicle, the ego travels at a lateral velocity of 0.5m/s ELK_OncomingVeh_Vlat_0.3, ego unintentionally changes lanes and collides with an oncoming vehicle, with ego’s lateral velocity of 0.3m/s ELK_OvertakingVeh_Unintent_Vlat_0.3, ego unintentionally changes lanes, overtake a vehicle in the other lane and collides. the ego travles at a lateral velocity at 0.3m/s ELK_RoadEdge_NoBndry_Vlat_0.2, ego unintentionally changes lanes and ends up to hte road edge, with lateral velocity at 0.2m/s LKA LKA_DashedLine_Solid_Left_Vlat_0.5 LKA_DashedLine_Unmarked_Right_Vlat_0.5 LKA_RoadEdge_NoBndry_Vlat_0.5 LKA_RoadEdge_NoMarkings_Vlat_0.5 LKA_SolidLine_Dashed_Left_Vlat_0.5 LKA_SolidLine_Unmarked_Right_Vlat_0.5 open source libs Microsoft: autonomous driving cookbook TrafficNet: an open scenario lib Toyota: vehicle simulation unity toolkit mathworks: github sensetime: driving stero dataset google: open simulator interface cmu: ml4ad scenarios in L2 vs L3+scenarios in l2 is more like point in the whole driving manuevor, at a certain scenario/point, how the ADAS system response. while L3+ is a continously scenario, along each point during driving is part of the scenario, and inside which all kinds of l2 case scenario can happen. it does make sense to integrate l2 scenario planning response to l3+, which makes l3+ system more strict. refer2018 Automated driving test mathworks: Euro NCAP driving scenarios in driving scenario designer Euro NCAP roadmap 2025 nhtsa vehicle safety legacy doc NHTSA driver assitance tech Study and adaptation of the autonomous driving simulator CARLA for ATLASCAR2 implementing RSS model on NHTSA pre-crash scenarios]]></content>
  </entry>
  <entry>
    <title><![CDATA[boto3 streamingBody to BytesIO]]></title>
    <url>%2F2020%2F03%2F03%2Fboto3-streamingBody-to-BytesIO%2F</url>
    <content type="text"><![CDATA[boto3 streamingBody to BytesIOboto3 class into A Session is about a particular configuration. a custom session: 123session = boto3.session.Session()ads = session.client('ads')ads_s3 = session.resource('s3') Resources is an object-oriented interface to AWS. Every resource instance has a number of attributes and methods. These can conceptually be split up into identifiers, attributes, actions, references, sub-resources, and collections. 1obj = ads_s3.Object(bucket_name="boto3", key="test.mdf") Client includes common APIs: 123456789101112131415161718copy_object()delete_object()create_bucket()delete_bucket()delete_objects()download_file()download_fileobj()get_bucket_location()get_bucket_policy()get_object()head_bucket()head_object()list_buckets()list_objects()put_bucket_policy()put_object()upload_file()upload_fileobj() Service Resource have bucket and object subresources, as well as related actions. Bucket, is an abstract resource representing a S3 bucket. 1b = ads_s3.Bucket('name') Object, is an abstract resource representing a S3 object. 1obj = ads_s3.Object('bucket_name', 'key') read s3 object &amp; pipeline to mdfreaderthere are a few try-and-outs. first is to streaming s3 object as BufferedReader, which give a file-like object, and can read(), but BufferedReader looks more like a IO streaming than a file, which can’t seek. botocore.response.StreamingBody as BufferedReaderthe following discussion is really really helpful:boto3 issue #426: how to use botocore.response.StreamingBody as stdin PIPE at the code of the StreamingBody and it seems to me that is is really a wrapper of a class inheriting from io.IOBase) but only the read method from the raw stream is exposed, so not really a file-like object. it would make a lot of sense to expose the io.IOBase interface in the StreamingBody as we could wrapped S3 objects into a io.BufferedReader or a io.TextIOWrapper.read() get a binary string . the actual file-like object is found in the ._raw_stream attribute of the StreamingBody class 1234import iobuff_reader = io.BufferedReader(body._raw_stream)print(buff_reader)# &lt;_io.BufferedReader&gt; wheras this buff_reader is not seekable, which makes mdfreader failure, due to its file operate needs seek() method. steam a non-seekable file-like object stdio stream to seekable file-like objectso I am thinking to transfer the BufferedReader to a seekable file-like object. first, need to understand why it is not seekable. BufferedRandom is seekable, whereas BufferedReader and BufferedWriter are not. Buffered streams design: BufferedRandom is only suitable when the file is open for reading and writing. The ‘rb’ and ‘wb’ modes should return BufferedReader and BufferedWriter, respectively. is it possbile to first read() the content of BufferedReader to some memory, than transfer it to BufferedRandom? which gives me the try to BufferedReader.read(), which basicaly read all the binaries and store it in-memoryA, then good news: in-memory binary streams are also aviable as Bytesio objects: f = io.BytesIO(b"some in-memory binary data") what if assign BytesIO to this in-memoryA. which really gives me a seekable object: fid_ = io.BufferedReader(mf4_['Body']._raw_stream) ; read_in_memory = fid_.read() bio_ = io.BytesIO(read_in_memory); then BytesIO object pointer is much more file-like, to do read() and seek(). referboto3 doc boto3 s3 api samples mdf4wrapper iftream to FILE what is the concept behind file pointer or stream pointer using io.BufferedReader on a stream obtained with open working with binary data in python read binary file and loop over each byte smart_open project PEP 3116 – New I/O]]></content>
      <tags>
        <tag>aws</tag>
      </tags>
  </entry>
  <entry>
    <title><![CDATA[ceph & boto3 introduction]]></title>
    <url>%2F2020%2F02%2F28%2Fceph-boto3-intro%2F</url>
    <content type="text"><![CDATA[ceph introA Ceph Storage Cluster requires at least one Ceph Monitor, Ceph Manager, and Ceph OSD (Object Storage Daemon) ceph storage clusterThe Ceph File System, Ceph Object Storage and Ceph Block Devices read data from and write data to the Ceph Storage Cluster. cluster operations data placement pools, which are logical groups for storing objects Placement Groups(PG), PGs are fragments of a logical object pool CRUSH maps, provide the physical topology of the cluster to the CRUSH algorithm to determine where the data for an object and its replicas should be stored, and how to do so across failure domains for added data safety among other things. Balancer, a feature that will automatically optimize the distribution of PGs across devices to achieve a balanced data distribution Librados APIs workflowapis can interact with: Ceph monitor as well as OSD. get libs configure a cluster handling the client app must invoke librados and connected to a Ceph Monitor librados retrieves the cluster map when the client app wants to read or write data, it creates an I/O context and bind to a pool with the I/O context, the client provides the object name to librados, for locating the data. then the client application can read or write data Thus, the first steps in using the cluster from your app are to 1) create a cluster handle that your app will use to connect to the storage cluster, and then 2) use that handle to connect. To connect to the cluster, the app must supply a monitor address, a username and an authentication key (cephx is enabled by defaul an easy way, in Ceph configuration file: 123[global]mon host = 192.168.0.1keyring = /etc/ceph/ceph.client.admin.keyring 1234567891011121314151617import radostry: cluster = rados.Rados(conffile='')except TypeError as e: print 'Argument validation error: ', e raise eprint "Created cluster handle."try: cluster.connect()except Exception as e: print "connection error: ", e raise efinally: print "Connected to the cluster." python api, has default admin as id, ceph as cluster name, and ceph.conf as confffile value creating an I/O contextRADOS enables you to interact both synchronously and asynchronously. Once your app has an I/O Context, read/write operations only require you to know the object/xattr name. 12345678910111213141516171819202122232425262728293031323334353637print "\n\nI/O Context and Object Operations"print "================================="print "\nCreating a context for the 'data' pool"if not cluster.pool_exists('data'): raise RuntimeError('No data pool exists')ioctx = cluster.open_ioctx('data')print "\nWriting object 'hw' with contents 'Hello World!' to pool 'data'."ioctx.write("hw", "Hello World!")print "Writing XATTR 'lang' with value 'en_US' to object 'hw'"ioctx.set_xattr("hw", "lang", "en_US")print "\nWriting object 'bm' with contents 'Bonjour tout le monde!' to pool 'data'."ioctx.write("bm", "Bonjour tout le monde!")print "Writing XATTR 'lang' with value 'fr_FR' to object 'bm'"ioctx.set_xattr("bm", "lang", "fr_FR")print "\nContents of object 'hw'\n------------------------"print ioctx.read("hw")print "\n\nGetting XATTR 'lang' from object 'hw'"print ioctx.get_xattr("hw", "lang")print "\nContents of object 'bm'\n------------------------"print ioctx.read("bm")print "Getting XATTR 'lang' from object 'bm'"print ioctx.get_xattr("bm", "lang")print "\nRemoving object 'hw'"ioctx.remove_object("hw")print "Removing object 'bm'"ioctx.remove_object("bm") closing sessions12345print "\nClosing the connection."ioctx.close()print "Shutting down the handle."cluster.shutdown() librados in Pythondata level operations configure a cluster handle To connect to the Ceph Storage Cluster, your application needs to know where to find the Ceph Monitor. Provide this information to your application by specifying the path to your Ceph configuration file, which contains the location of the initial Ceph monitors. 12import rados, syscluster = rados.Rados(conffile='ceph.conf') connect to the cluster 123456789cluster.connect()print "\nCluster ID: " + cluster.get_fsid()print "\n\nCluster Statistics"print "=================="cluster_stats = cluster.get_cluster_stats()for key, value in cluster_stats.iteritems(): print key, value manage pools 12345cluster.create_pool('test')pools = cluster.list_pools()for pool in pools: print poolcluster.delete_pool('test') I/O context to read from or write to Ceph Storage cluster, requires ioctx. 123ioctx = cluster.open_ioctx($ioctx_name)#ioctx = cluster.open_ioctx2($pool_id) ioctx_name is the name of the pool, pool_id is the ID of the pool read, write, remove objects 123ioctx.write_full("hw", "hello")ioctx.read("hw")ioctx.remove_object("hw") with extended attris 123456789101112131415ioctx.set_xattr("hw", "lang" "en_US")ioctx.get_xattr("hw", "lang")``` * list objs ```python obj_itr = ioctx.list_objects()while True: try: rados_obj = obj_itr.next() print(rados_obj.read()) RADOS S3 apiCeph supports RESTful API that is compatible with basic data access model of Amazon S3 api. 12345PUT /&#123;bucket&#125;/&#123;object&#125; HTTP/1.1DELETE /&#123;bucket&#125;/&#123;object&#125; HTTP/1.1GET /&#123;bucket&#125;/&#123;object&#125; HTTP/1.1HEAD /&#123;bucket&#125;/&#123;object&#125; HTTP/1.1 //get object info GET /&#123;bucket&#125;/&#123;object&#125;?acl HTTP/1.1 Amazon S3Simple Storage Serivce(s3) is used as file/object storage system to store and share files across Internet. it can store any type of objects, with simple key-value. elastic computing cluster(ec2) is Amazon’s computing service. there are three class in S3: servivce, bucket, object. there are two ways to access S3, through SDK(boto), or raw Restful API(GET, PUT). the following is SDK way. create a bucketPut(), Get(), return all objects in the bucket. Bucket is a storage location to hold files(objects). 123456789101112131415161718def create_bucket(bucket_name, region=None): try: if region is None: s3_client = boto3.client('s3') s3.client.create_bucket(Bucket=bucket_name) else: s3_client = boto3.client('s3', region_name=region) location = &#123;'LocationConstraint': region&#125; s3_client.create_bucket(Bucket=bucket_name, CreateBucketConfiguration=location) except ClientError as e: logging.error(e) return False return True response = s3_client.list_buckets()for bucket in response['Buckets']: print bucket upload filesbasically to upload a file to an S3 bucket. 12345678910111213def upload_file(file_name, bucket, object_name=None): if object_name is None: object_name = file_name # Upload the file s3_client = boto3.client('s3') try: response = s3_client.upload_file(file_name, bucket, object_name) except ClientError as e: logging.error(e) return False return True upload_file() handles large files by splitting them into smaller chunks and uploading each chunk in parallel. which is same logic as ceph to hide the lower-level detail of data splitting and transfering. upload_fileobj() accepts a readable file-like object, which should be openedin bin mode, not text mode: 123s3 = boto3.client('s3')with open("FILE_NAME", "rb") as f: s3.upload_fileobj(f, "BUCKET_NAME", "OBJECT_NAME") all Client, Bucket and Object have these two methods. download filesdownload_file() a pair of upload files method, which accepts the names of bucket and object to download, and the filename to save the downloaded file to. 12s3 = boto3.client('s3')s3.download_file('BUCKET_NAME', 'OBJECT_NAME', 'FILE_NAME') same as upload file, there is a read-only open method: download_fileobj(). bucket policy123456s3 = boto3.client('s3')result = s3.get_bucket_policy(Bucket='BUCKET_NAME')bucket_policy = json.dumps(bucket_policy)s3.put_bucket_policy(Bucket=bucket_name, Policy=bucket_policy)s3.delete_bucket_policy(Bucket='BUCKET_NAME')control_list = s3.get_bucket_acl(Bucket='BUCKEET_NAME') get objects12obj_list = s3.list_objects(Bucket='bucket_name')obj_cont = s3.get_object(Bucket='bucket_name', Key='file_key') error: botocore.errorfactory.NoSuchKey: An error occurred (NoSuchKey) when calling the GetObject operation: Unknown read objects through url (restful api)through url path to get a file/object: https://&lt;bucket-name&gt;.s3.amazonaws.com/&lt;key&gt; referceph official doc botocore official doc boto3 github amazon S3 example GUI: create an S3 bucket and store objects in AWS boto API access S3 object]]></content>
      <tags>
        <tag>aws</tag>
      </tags>
  </entry>
  <entry>
    <title><![CDATA[from HPC to cloud (2) - distributed storage intro]]></title>
    <url>%2F2020%2F02%2F27%2Ffrom-HPC-to-cloud-2%2F</url>
    <content type="text"><![CDATA[常见分布式存储背景 块存储 vs 文件存储 文件存储读写慢，利于共享；块存储读写快，不利于共享。所有磁盘阵列都是基于Block块的模式，而所有的NAS产品都是文件级存储。物理块与文件系统之间的映射关系：扇区→物理块→逻辑块→文件系统。所以，文件级备份，需要一层一层往下查找，最后才能完成整个文件复制。 对象存储 vs 块存储 富含元数据的对象存储使大规模分析成为企业的有吸引力的命题。对象存储也使得提供和管理跨越地理位置的存储变得相对经济。同时，块存储的属性使其成为高性能业务应用程序、事务数据库和虚拟机的理想选择，这些应用程序需要低延迟、细粒度的数据访问和一致的性能。块存储(云盘)只能被一个主机挂载。 文件存储 vs 对象存储 对象存储(ceph, Linux block device)和文件系统(hdfs, gfs, restful Web API)在接口上的本质区别是对象存储不支持fread(), fwrite()类似的随机位置读写操作，即一个文件PUT到对象存储里以后，如果要读取，只能GET整个文件，如果要修改一个对象，只能重新PUT一个新的到对象存储里，覆盖之前的对象或者形成一个新的版本。对象存储的接口是REST风格的，通常是基于HTTP协议的RESTful Web API，通过HTTP请求中的PUT和GET等操作进行文件的上传即写入和下载即读取，通过DELETE操作删除文件。文件存储读写api如，file.open(); 对象存储读写api如 s3.open()。文件存储，多次读写；对象存储，一次写，多次读。对象存储的容量是无上限的；文件存储的容量达到一定上限，性能会极差。 对象存储，需要对原数据做解析吗？ 存储类型 块存储Direct Attach Storage(DAS), 每台主机服务器有独立的存储设备，各存储设备之间无法互通，需要跨主机存取。 Storage Area Network(SAN), 高速网络连接的专业存储服务器，与主机群通过高速i/o连结。 文件存储Network Attached Storage(NAS), 连接在网络上并提供存取服务。采用NFS或CIFS命令集访问数据，以文件为传输协议，通过TCP/IP实现网络化存储，可扩展性好、价格便宜、用户易管理 对象存储核心是将数据通路（数据读或写）和控制通路（元数据）分离，并且基于对象存储设备（Object-based Storage Device，OSD）构建存储系统。对象存储设备具有一定的智能，它有自己的CPU、内存、网络和磁盘系统. 一般配置metadata server(MDS)。 存储用例 对象存储非结构化(文档、图像、视频等)、备份存档、大数据分析（无实时性） 块存储比较适合数据库业务、虚拟机 文件存储企业共享、hpc大数据分析、流媒体处理，web服务。 容器云容器云架构12345控制台门户(管理员入口）saas(）paas(容器，微服务、中间件、cicd)资源调度管理平台iaas(计算、存储、网络），虚拟化提供资源池， 物理机直接架构k8s 容器云与私有云搭建的硬件资源分开。 虚拟层上支持k8s 容器部署在虚拟机上的原因：既需要容器，也需要虚拟机的环境。一般互联网公司倾向分开容器的资源 和 虚拟机的资源。不管是容器云，还是虚拟机，都会有一个web管理平台。考虑弹性升缩，vm更灵活。而如果容器直接部署在x86裸机上，如果没有容器的管理层，docker容器挂了，就无法恢复。 应用部署容器云 vs 私有云 仿真计算，倾向容器环境 CI/CD开发工具、中间件，倾向容器环境 web server，倾向容器环境 sql，倾向物理机或虚拟机 参考KingStack UCloud EasyStack]]></content>
  </entry>
  <entry>
    <title><![CDATA[from HPC to cloud computing]]></title>
    <url>%2F2020%2F02%2F25%2Ffrom-HPC-to-cloud-computing%2F</url>
    <content type="text"><![CDATA[HPCAnsys HPC, OpenFoam in amazon EC2, rescale, these are some samples of current CAE applications in HPC cloud. most CAE simulations, in nut, are solving large sparse matrix, either optmization or differential, which depends on famous linear equations solvers/libs, e.g. Petsc, libMesh, Intel Math Kernel e.t.c, inside which are implemented with message passing interface(MPI), share memory, or similar techs. why MPI is a have-to in these applications? because the problem interested itself is so huge, which is so much computing heavily than windows office, WeChat. as engineers want the result more preciesly, the problem interested dimension will increase more. e.g. a normal static stress analysis of vehicle engine is about 6 million elements, each of which will take 6 ~ 12 vertexes, and each vertex has 6 DoF, which gives about a kinematic matrix with size about 200 million x 200 million, no single PC can store this much data, and not even to do the matrix computing then. so all these problems have to be running on super computers, or high-performance computer(HPC), which have more memory or CPU cores. beyond large memory and CPU cores, also need fast-speed network, as each CPU can do calculation really fast, 2 ~ 4 GHz, so if the system can’t feed data that fast, the CPUs are hungry, which is the third feature of HPC: Gb high-speed internal network. as the NUMA/multi-core CPU architecture is popular now, to achieve the best performance from these chips is also a hot topic. cloudif taken HPC as centerlized memory and CPU cores(the Cathedral), then cloud is about distributed memory and CPU cores(the Bazaar). cloud is more Internet, to connected weak-power nodes anywhere, but totally has great power. the applications in cloud must be easy to decomposed in small pieces, so each weak-power node can handle a little bit. when coming to cloud computing, I’d think about Hadoop, OpenStack, and docker. hadoop is about to analysis massive data, which is heavily used in e-commercial, social network, games. docker is the way to package small application and each of the small image can run smoothly in one node, which currently is call micro-serivce, which is exactly as the name said, MICRO. OpenStack take care virtualization layer from physical memory, cpu resources, which used about in public cloud, or virtual office env. A or Bcloud is more about DevOps, as each piece of work is not difference from the single PC, but how to deploy and manage a single piece of work in a large cloud is the key, so develop and operations together. compare to HPC, both develop and operation needs professions. cloud computing has extended to edge computing, which is more close to normal daily life, and more bussiness-valued; while HPC guys are more like scientist, who do weather forcast, rocket science, molecular dynamics e.t.c. finally, the tech itself, cloud or HPC, has nothing to do with the bussiness world. refercloudscaling: grid, hpc, cloud… what’s the difference linkedin: supver computer vs cloud]]></content>
  </entry>
  <entry>
    <title><![CDATA[mdf4 reader]]></title>
    <url>%2F2020%2F02%2F24%2Fmdf4-reader%2F</url>
    <content type="text"><![CDATA[read mdf4 in general the most important “branch” in the tree is the list of data groups (DG block). record used to store the plain measurement data, the records can either be contianed in one single “data” (DT) block, or distributed over several DT blocks using a “data list” block . Each DG block has points to the data block, as well as necessary information to understand and decode the data. the sub-tree of DG is channel group(CG) blocks and channel(CN) blocks. record layouteach record contains all signal values which have the same time stamp, in the same channel. for different channel groups(CG), each record must start with a “record id” the record layout of ID 1, will be described by one channel group, the record layout of ID 2 by another channel group. Both channel groups must be children of the DT block’s parent data group. channelsEach channel describes a recorded signal and how its values are encoded in the records for the channel’s parent CG block. vector APIswhich is licensed by vector GetFileManager() OpenFile() GetChannelByName() GetChannelGroup() CreaterDataPointer() GetRecordCount() SetReadMode() SeekPos() GetPos() GetTimeSec() ReadPhysValueDouble(pChannle, &amp;value, &amp;isvalid) ReadRawValueDouble(pChannel, &amp;rawValue) ReadPhysValueDouble(pChannel, &amp;phyValue) MultiReadPhysValueDouble(pChannel, valueBuffer, size, &amp;read) turbolabReadFileMF4 class: Open() Seek(pos) Read(Size, Buffer) Close() CMdf4Calc class : Cmdf4Calc(pChan, pblock) Cmdf4Calc(pChan, mf4file) cmf4DataGroup class: GetRecord() GetRawValueFromRecord() GetValueFromRecord() mdf4FileImport class : ImportFile() ReleaseFile() I am not reading this C++ code. asammdfMDF class init(name=None, ): name can be either string or BytesIO filter(channels, ) get_group(index, channels=None, ) iter_channels(,) iter_get(channel_name=None,group_id=None, group_index=None, ) iter_groups() to_dataframe(), generate pandas DataFrame whereis(channel_name) MDF4 classincludes data_group, channel_group, channels, data_block, data_location, record_size e.t.c get(channel_name=None, group=None, ) get_can_signal() get_channel_name(group, index) get_channel_unit(channel_name=Nont, group=None, ) get_master(index, data=None, ), return the master channel samples for given group info() the following are sub data blocks of MDF4: Channel class ChannelGroup class DataGroup class DataList class DataBlock class HeaderBlock class 12mdf = MDF4("basic.mf4")db_info = mdf.info() all group, channel data is packages as python dict. the most exciting part, as asam-mdf can support BytesIO, which gives the way to read mdf files stores in Amazon S3: 12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758 mf4_ = obj_api.get_object("bucket_name", "sample.mf4") fid_ = io.BufferedReader(mf4_['Body']._raw_stream) ; read_in_memory = fid_.read() bio_ = io.BytesIO(read_in_memory); if bio_.seekable(): mdf_ = MDF(bio_) mdf_.info()``` ### installation of asam-mdf* [install pandas](https://www.pypandas.cn/en/docs/installation.html#installing-using-your-linux-distribution’s-package-manager)`python3 -m pip install pandas` (not working)`sudo apt-get install python3-pandas` * [install canmatrix](https://canmatrix.readthedocs.io/en/latest/installation.html)`sudo python3 setup.py install` * install pathlib2 `sudo -H pip3 install pathlib2 `if `WARNING: pip is being invoked by an old script wrapper. This will fail in a future version of pip.`using: `sudo -H python3 -m pip install（包名）`## [mdfreader](https://github.com/ratal/mdfreader)which is open source LSP licensed. and the APIs are clear about getChannels, and read data from each channel, which was my first choice. but later found out it's difficult to support streamIO input, which was the best format I can reached from Amazon S3 pipeline.##### MdfInfo* read_info()* list_channels()##### Mdf* read()* filter_channel_names()* get_channel_data()* get_channel_unit()* keep_channels()```pythonimport mdfreaderfilename="basic.mf4"file_info=mdfreader.MdfInfo()channels = file_info.list_channels(filename)print(channels)file_pointer = mdfreader.Mdf(filename)keys_ = file_pointer.keys()for k_ in keys_: chan1 = file_pointer.get_channel_data(k_) print(chan1) python necessarybuffered iobinary I/O, usually used in following: 123f = open("mdf_file", "rb")type(f)&lt;type '_io.BufferedReader'&gt; open(file, mode=’r’, buffering=-1, encoding=…) ifmode=&#39;b&#39;, the returned content is bytes object, can’t encoding. if buffereing is not setted, for binary file with buffering with a fixed length. when with mode=&#39;w/r/t, it returns io.TextIOwrapper, if mode=rb, it returns io.BufferedReader, else if mode=wb, it returns io.BufferedWriter, else mode=wrb, it returns io.BufferedRandom. check here dict built-in funs12345678910cmp(dict1, dict2) ;len(dict)str(dict)dict.clear()dict.get(key)dict.has_key(key)dict.items()dict.keys()dict.values()pop(key) a bit about readersa few years ago I was in CAE field, dealing with different kind of CAE format, e.g .nas. during that time, there is a similar need to have a file transfer from vendor specified format to python format e.t.c. so now it comes again, which is funny to know adapters in CAE and autonomous driving design. referasammdf API doc asam mdf4 doc]]></content>
  </entry>
  <entry>
    <title><![CDATA[gpu rendering in k8s cloud]]></title>
    <url>%2F2020%2F02%2F21%2Fgpu-rendering-in-k8s-cloud%2F</url>
    <content type="text"><![CDATA[backgroundto deploy simulation in docker swarm, there is a hurting that GPU rendering in docker requires a physical monitor(as $DISPLAY env variable). which is due to GeForece(Quadro 2000) GPUs, for Tesla GPUs there is no depends on monitor. really intertesting to know. there were two thoughts how to deploy simulation rendering in cloud, either a gpu rendering cluster, or a headless mode in container cluster. unity renderingunity rendering pipeline is as following ： CPU calculate what need to draw and how to draw CPU send commands to GPU GPU plot direct renderingfor remote direct rendering for GLX/openGL, Direct rendering will be done on the X client (the remote machine), then rendering results will be transferred to X server for display. Indirect rendering will transmit GL commands to the X server, where those commands will be rendered using server’s hardware. 12direct rendering: YesOpenGL vendor string: NVIDIA Direct Rendering Infrasturcture(DRI) means the 3D graphics operations are hardware acclerated, Indirect rendering is used to sya the graphics operations are all done in software. DRI enable to communicate directly with gpus, instead of sending graphics data through X server, resulting in 10x faster than going through X server for performance, mostly the games/video use direct rendering. headless modeto disable rendering, either with dummy display. there is a blog how to run opengl in xdummy in AWS cloud; eitherwith Unity headless mode, When running in batch mode, do not initialize graphics device at all. This makes it possible to run your automated workflows on machines that don’t even have a GPU. to make a game server running on linux, and use the argument “-batchmode” to make it run in headless mode. Linux -nographics support -batchmode is normally the headless flag that works without gpu and audio hw acceleration GPU containera gpu containeris needs to map NVIDIA driver from host to the container, which is now done by nvidia-docker; the other issue is gpu-based app usually has its own (runtime) libs, which needs to install in the containers. e.g. CUDA, deep learning libs, OpengGL/vulkan libs. k8s device plugink8s has default device plugin model to support gpu, fpga e.t.c devices, but it’s in progress, nvidia has a branch to support gpu as device plugin in k8s. nvidia docker mechanismwhen application call cuda/vulkan API/libs, it further goes to cuda/vulkan runtime, which further ask operations on GPU devices. cloud DevOpssimulation in cloudso far, there are plenty of cloud simulation online, e.g. ali, tencent, huawei/octopus, Cognata. so how these products work? all of them has containerized. as our test said, only Tesla GPU is a good one for containers, especially for rendering needs; GeForce GPU requires DISPLAY or physical monitor to be used for rendering. cloud vendors for carmakersas cloud simulation is heavily infrastructure based, OEMs and cloud suppliers are coming together: FAW -&gt; Ali cloud GAC -&gt; Tencent Cloud SAC -&gt; Ali cloud / fin-shine Geely -&gt; Ali cloud Changan -&gt; Ali cloud GWM -&gt; Ali cloud Xpeng, Nio (?) PonyAI/WeRide (?) obviously, Ali cloud has a major benefit among carmakers, which actually give Ali cloud new industry to deep in. traditionally, carmakers doesn’t have strong DevOps team, but now there is a change. we can see the changing, as connected vehicles service, sharing cars, MaaS, autonomous vehicles are more and more requring a strong DevOps team. but not the leaders in carmakers realize yet. fin-shineSAC cloud, which is a great example for other carmakers to study. basically they are building the common cloud service independently. container service &lt;– k8s/docker storage &lt;– hdfs/ceph big data &lt;– hadoop/sql middleware applications &lt;– AI, simulation it’s an IP for SAC, but on the other hand, to maintain a stable and user-friendly cloud infrastructure is not easy at all. referdocker blender render cluster nvidia vgpu using GPUs with K8s K8S on nvidia gpus openGL direct hardware rendering on Linux runinng carla without display tips for runing headless unity from a script Unity Server instance without 3d graphics a blog: nvidia device plugin for k8s k8s GPU manage &amp; device plugin mechanism]]></content>
  </entry>
  <entry>
    <title><![CDATA[review-apollo (2).md]]></title>
    <url>%2F2020%2F02%2F17%2Freview-apollo-2%2F</url>
    <content type="text"><![CDATA[review apollo (2)EMPlannerin review apollo(1), we had went through perception, prediction and part of planning module. the ADS planning problem can be summaried with a few input conditions, we can get a few drivable driving path, now EMPlanner, here need to find out the best one. input conditions: perception and prediction obstacles and its future(in 5s) trajectory current road status, e.g. special area, pedestrain cross e.t.c. both path and speed need to be planned or optimized. the core of EMPlanner is Dynamic programming(DP), which is an classic algorithm to search the global optimized solution. the DP garantee each sub-group best solution accumulated to the final global best solution. PathOptimizerthe DP used in path optimizer basic processes: starting from current location, in every time, move in x-direction about next n meters(to next location), and sample in y-direction at the next location, calcuate the cost from current location to each of the sampling point at the next location, and save cost value for each path for this sub path segment. (this is the different from DP to greedy algorithm), finally we retrieve all the cost values tables for each sub-path, and get the final best solution for the whole path. a few question about DP here: how long in x-direction need to plan ? KMinSampleDistance = 40.0 what size of the interval in x-direction is good ? 1step_length = math::Clamp(init_point.v() * SamplePointLookForwardTime, config_.step_length_min(), config_.step_length_max()); how many sample points in y-direction is needed ? 1num_sample_per_level = config_.sample_points_num_each_level(); how to sample in y-direction and x-direction ? which is case by case, depends on the ego’s maneuver as well as the lanes info. basically, we had RouteSegments/ReferenceLine set, which includes the filtered referencelines for ego vehicle. for each referenceLine, we can calcuate the cost, then figure out the min cost referenceline as planning module’s output. 1234567891011121314151617181920212223242526EMPlanner::Plan(&amp;start_point, *frame)&#123; foreach reference_line_info in frame-&gt;reference_line_info(): PlanOnReferenceLine();&#125;EMPlanner::PlanOnReferenceLine(start_point, frame, reference_line_info)&#123; foreach optmizer in tasks_ : ret = optmizer-&gt;Execute(frame, reference_line_info);&#125;PathOptimizer::Execute(*frame, *reference_line_info)&#123; ret = Process(speedData, reference_line, start_point, path_data);&#125;DpPolyPathOptimizer::Process(SpeedData, ReferenceLine, init_point, path_data)&#123; dp_road_graph.FindPathTunnel();&#125;DPRoadGraph::FindPathTunnel(init_point, &amp;obstacles, *path_data)&#123; CalculateFrenetPoint(init_point, &amp;init_frenet_frame_point_); GenerateMinCostPath(obstacles, &amp;min_cost_path); foreach point in min_cost_path: frenet_path.append(transfer(point)); path_data-&gt;SetFrenetPath(frenet_path);&#125; point2point cost calcualatione.g. from the left interval sample points(in y-direction) to current interval sample points(in y-direction). during this cost calcuation, consider pathCost, staticObstacleCost and DynamicObstacleCost. 12345TrajectoryCost::Calculate()&#123; total_cost += CalculatePathCost(); total_cost += CalculateStaticObstacleCost(); total_cost += CalculateDynamicObstacleCost();&#125; UpdateNode() 12345678910DPRoadGraph::GenerateMinCostPath()&#123; for(level=1; level &lt; path_waypoints.size(); ++level)&#123; level_points = path_waypoints[level]; //all sample points in y-direction at current level for(i=0; i&lt;level_points.size(); ++i)&#123; cur_point = level_points[i]; UpdateNode(prev_dp_points, level, &amp;trajectory_cost, &amp;cur_point); &#125; &#125;&#125; PathCost from each point at prevous level to current point at current level, consider the length, 1st order(speed), 2sec order(acc) should be smooth, can give a 5th-order polyfit, same as what’d done in referenceLine smoother. for each interval/curve polyfit(by curve.Evaluate()), we can re-sampling it more density, so at each new sampling point, calculate its (position, speed, acc)l, dl, ddl 123456789101112131415161718192021222324252627282930TrajectoryCost::CalculatePathCost(curve, start_s, end_s, curr_level)&#123; foreach sampling_point in (end_s - start_s): l = curve.Evalaute(0, sampling_point) // cost from l path_cost += l * l * config_.path_l_cost() // cost from dl dl = curve.Evaluate(1, sampling_point) path_cost += dl * dl * config_.path_dl_cost() // cost from ddl ddl = curve.Evaluate(2, sampling_point) path_cost += ddl * ddl * config_.path_ddl_cost() &#125;``` * StaticObstacleCostStaticObstacleCost() used the same idea from PathCost, basically, to resampling the interval, and calculate a cost on each of the sampling point. ```c++TrajectoryCost::CalculateStaticObstalceCost(curve, start_s, end_s)&#123; foreach curr_s in (end_s - start_s): curr_l = curve.Evaluate(0, curr_s, start_s); foreach obs_boundary :: static_obstacle_boundaries: obstacle_cost += GetCostFromObs(curr_s, curr_l, obs_boundary);&#125;TrajectoryCost::GetCostFrmObs()&#123; delta_l = fabs(adc_l - obs_center); obstacle_cost.safety_cost += config_.obstacle_collision_cost() * Sigmoid(config_.obstacle_collision_distance() - delta_l) ; DynamicObstacleCost 1234567891011TrajectoryCost::CalculateDynamicObstacleCost(curve, start_s, end_s)&#123; foreach timestamp: foreach obstacle_trajectory in dynamic_obstacle_boxes_: obstacle_cost += GetCostBetweenObsBoxes(ego_box, obstacle_trajectory.at(timestamp));&#125;TrajectoryCost::GetCostBetweenObsBoxes(ego_box, &amp;obs_box)&#123; distance = obstacle_box.DistanceTo(ego_box); if(distance &gt; config_.obstacle_ignore_distance()) return obstacle_cost ; obstacle_cost.safety_cost += ( config_.obstacle_collision_cost() + 20) * Sigmoid(config_.obstacle_collision_distance() - distance); static obstacles cost calculating is like one space-dimension, only consider cost from the fixed/static obstacle to the fixed sampling points. for dynamic obstacles case, as the obstacle is moving, so we need a time-dimension, to calculate the cost at each time interval, from the current obstacle location to the fixed sampling points. for dynamic obstacle cost, there is a risk cost (20), which need to take care. min cost pathwe got the cost matrix from the start_s level to the target_s/final level. but the final level in y-direction has 7 sampling points, all of which are drivable. so backward propagation, to get the best path (or a NODE in DAG) 1234DPRoadGraph::GenerateMinCostPath()&#123; for(cur_dp_node : graph_nodes.back())&#123; fake_head.UpdateCost() &#125; SpeedOptimizerthe previous PathOptimier give the best (s, l) location of ego should arrived at each interval section along the referenceline. so what’s the best speed to reach these interval section ? to do speed optimzier, two parts: constraint speed conditions, e.g. speed limit, upper/lower boundary of drivable area along the refereneline due to obstacle trajectory speed planning BoundaryMapper()for each obstalce’s trajectory, sampling in time-dimension, if its trajectory has overlapped with ego’s trajectory, mark the boundary box. 123456789101112131415161718StBoundaryMapper::CreateStBoundary()&#123; foreach trajectory_point in trajectory: obs_box = obstacle.GetBoundingBox(trajectory_point); for(path_s=0.0, path&lt;s &lt; discretized_path.Length(); path_s += step_length)&#123; curr_adc_path_point = discretized_path.EvaluateUsingLinearApproximation(); if( CheckOverlap(curr_adc_path_point, obs_box, st_boundary_config_.boundary_buffer()) ) &#123; &#125; &#125;&#125;// only when obs trajectory overlapped with ego's trajectoryStBoundaryMapper::MapWithDecision(path_obstacle, decision)&#123; boundary = StBoundary::GenerateStBoundary(lower_points, upper_points); path_obstacle-&gt;SetStBoundary(boundary); // assign different boundary type &#125; SpeedLimitDecider speed limit centripetal acc limit centripetal force limit DpStGraph::Searchso far we have info from boundaryMapper, which tells lower_s and upper_s of safe driving area along reference line, excluding the overlaping area of obstacle’s trajectories; as well as speed limit. so now in the driving safe area, which speed ego should take at each point is also done by DP, speed can be represent in time-space two dimension, in a T-S table. so speed DP can transfer to find the min cost [T,S] althrough the whole planning drive path. 123456789101112131415161718192021222324252627282930313233DpStGraph::Search(speed_data)&#123; InitCostTable().ok(); CalculateTotalCost().ok(); RetriveSpeedProfile(speed_data).ok();&#125;``` ### QpSplineSpeedOptimizer DPSpeedOptimizer output the value pair of (time, location)(t, s) at each interval, then can get the velocity, acc, angular velocity by differencial among the neighbor intervals. QPSpline is a better way to get a polyfit 2nd-order smoothed speed profile along the referenceLine, which is higher-precision than DP.### SpeedOptimizer and PathOptimizer fusionPathOptimizer outputs driving path in (accumulated_distance, section_direction distance)(s, l), SpeedOptimizer outputs driving path in (time, accumulated_distance)(t, s). now to get the best/optmized path along the referenceLine, need combine these two outputs.```c++for(cur_rel_time=0.0; cur_rel_time &lt; speed_data_.TotalTime(); cur_rel_time += TimeSec)&#123; speed_data_.EvaluateByTime(cur_rel_time, &amp;speed_point); if(speed_point.s() &gt; path_data_.discretized_path().Length()) break; path_data.GetPathPointWithPathS(speed_point.s(), &amp;path_point); path_point.set_s(path_point.s() + start_s); trajectory_point.mutable_path_point()-&gt;CopyFrom(path_point); trajectory_point.set_v(speed_point.v()); trajectory_point.set_a(speed_point.a()); trajectory_point.set_relative_time(speed_point.t() + relative_time); ptr_discretized_trajectory-&gt;AppendTrajectoryPoint(trajectory_point); &#125; the above fusion is on one ReferenceLine, for all referenceLine, we can reach out the min cost referenceline as following: Frame::FindDriveReferenceLineInfo(){ foreach reference_line_info in reference_line_info_ : if(reference_line_info.IsDrivable() &amp;&amp; reference_lien_info.Cost &lt; min_cost){ drive_reference_line_info_ = &amp;reference_line_info; min_cost = reference_line_info.Cost(); } return drive_reference_line_info_ ; } Planning in summaryPlanning module has three components, ReferenceLineProvider, Frame and EMPlanner. Frame has ReferencelineInfo, which has everything about once-time planning, including the min-cost ReferenceLineInfo from EMPlanner. EMPlanner do DP and QP for PathPlan and SpeedPlan, seperately, which give a best driving path for each RefereneLine. in the whole, we can reach out the global best(min-cost) referenceLine from EMPlanner, and store back in Frame. ControlControl module readin the planning ReferenceLine (Trajectory, VehicleStatus) and Localization, output control command. ControlComponent::Proc(){ chassis_msg = chassis_reader_-&gt;GetLatestObserved(); OnChassis(chassis_msg); trajectory_msg = trajectory_reader_-&gt; GetLatestObserved(); localization_msg = localization_reader_-&gt; GetLatestObserved(); status = ProduceControlCommand(&amp;control_command); control_cmd_writter_-&gt;Write(); } simulationsimulation is a system level verification/test tool, there are a few topics: simulate the worldhow to make a virtual world in simulation, quickly, easily, close to real and massively deploy-friendly. currently most commericial tools, e.g. PreScan, CarMaker, are supporting well real map(.osm), and on-line/distributed/cloud service, which is good way. a few pioneers are trying to make the virtual world by 3D build through image/point cloud scanning. which is maybe the closest way to the physical world, but need large computing resource. integrate ADS systemhow to integrate ADS system to simulation, easily. most simulation software/platform is independent from ADS core. for in-house prototype team, they both develop ADS and simulate in Matlab. which is pretty common in most OEM teams. in product phase, it’s common to package ADS core as ros node, and talk to the simulation software by ROS. friendly APIsystem level simulation has plenty of scenarios, requiring the simulation platform has friendly way to produce massively scenarios. including env, ego/npc behaviors, special cases e.t.c most common APIs are Python and Matlab. massively deploymentit’s back to the test purpose, simulation needs to cover as much as possible scenarios, which requires IT infrastructure to support massively deployment. e.g. cloud deployment `]]></content>
      <tags>
        <tag>apollo</tag>
      </tags>
  </entry>
  <entry>
    <title><![CDATA[review apollo (1)]]></title>
    <url>%2F2020%2F02%2F15%2Freview-apollo-1%2F</url>
    <content type="text"><![CDATA[dig into Apollo/Daohu527 and YannZ/Apollo-Notes are some good summary of Apollo. localizationrtk!image 12345678910111213141516171819RTKLocalizationComponent::InitIO()&#123; corrected_imu_listener_ = node_ -&gt;CreateReader&lt;localization::CorrectImu&gt;(); gps_status_listener_ = node_ -&gt; CreateReader&lt;driver::gnss::InsStat&gt;();&#125;RTKLocalizationComponent::Proc(&amp; gps_msg)&#123; localization_-&gt;GpsCallback(gps_msg); if(localization_-&gt;IsServiceStarted()) &#123; LocalizationEstimate localization; localization_-&gt;GetLocalization(&amp;localization); localization_-&gt;GetLocalizationStatus(&amp;localization_status); PublishPoseBroadcastTopic(localization); &#125; &#125; ndtnormal distribution transform(NDT) is to match our sensors view points compare to what we see on an existing map. due to the view points from sensor may be a little off from the map, or the world might change a little between when built the map and when we record the view points. so NDT will try to match view points to a grid of probability functions created from the map, instead of the map itself. msfperceptionmodule overview1234567891011121314151617181920212223242526272829303132333435363738394041424344void Perception::RegistAllOnboardClass() &#123; RegisterFactoryLidarProcessSubnode(); RegisterFactoryRadarProcessSubnode(); RegisterFactoryFusionSubnode(); traffic_light::RegisterFactoryTLPreprocessorSubnode(); traffic_light::RegisterFactoryTLProcSubnode();&#125;``` the data flow in perception has two ways, either ROS message send/subscribe, or shared data. Apollo design a `global map` data structure:```c++GlobalFactoryMap&lt;string, map&lt;string, ObjectFactory*&gt; &gt;``` the first string element can be `SubNode` or `ShareData`.e.g. * lidar share data can stored as: `GlobalFactorMap[sharedata][LidarObjectData]`* radar subnode data can stored as: `GlobalFactorMap[Subnode][RadarObjectData]`#### DAG processafter ros subnode and shared data registered(not instance), [perception module create a DAG](https://github.com/YannZyl/Apollo-Note/blob/master/docs/perception/perception_software_arch.md)(directional acyclic graph) process, which includes subNode, Edge, and ShareData. `Edge` defines start node and end node of the data flow. e.g. LidarProcessSubnode -&gt; FusionSubnode.* subnode init ```c++DAGStreaming::InitSubnodes()&#123; for(auto pair: subnode_config_map) &#123; Sunode* inst = SubnodeRegisterer::GetInstanceByName(subnode_config.name()); inst-&gt;Init() ; &#125; &#125; edge init 1234DAGStreaming::Init()&#123; event_manager_.Init(dag_config_path.edge_config());&#125; it’s easy to understand EventManager is used to register edge, as the data flow either by ros node or by share data is event driven. inside EventManager there are two variables: event_queue_map &lt;eventID, message_queue&gt; event_meta_map &lt;eventID, message_info&gt; shareData init 1234DAGStreaming::Init()&#123; InitSharedData(dag_config.data_config());&#125; each subnode and shareData is a separate ROS/data thread, which started after DAG process initialization finished, the run() process is the event loop: 1234567891011121314151617181920DAGStreaming::Schedule()&#123; for(auto&amp; pair: subnode_map_) &#123; pair.second-&gt;Start(); &#125; for(auto&amp; pair : subnode_map_) &#123; pair.second-&gt;Join(); &#125;&#125;Subnode::Run()&#123; while(!stop) &#123; status = ProcEvents(); &#125;&#125; Lidar process hdmap ROI filter. basically compare the lidar cloud points to hdmap area, and filter out the cloud points outside of ROI. CNN image segmentation, basically classfiy the point clouds as obstacles based on CNN. obstacle minBox obstalce tracking lidar fusion Lidar &amp; Radar fusionthe data fusion ros node proces in the following way: step1: collect sensor’s data 1234567891011121314151617181920212223242526272829303132333435363738394041424344FusionSubnode::Process()&#123; BuildSensorObjs(events, &amp;sensor_objs); fusion_-&gt;Fusion(sensor_objs, &amp;objects+);&#125;FusionSubnode::BuildSensorObjs(&amp;events, std::vector&lt;SensorObjects&gt; * multi_sensor_objs)&#123; foreach event in events: &#123; if( event.event_id == lidar ) &#123; sensor_objects.sensor_type = VELODYNE ; &#125; else if ( event.event_id == radar)&#123;&#125; else if (event.event_id == camera) &#123;&#125; else&#123;return false;&#125; &#125; multi_sensor_objs-&gt;push_back(*sensor_objects); &#125; ``` step2: process sensors data```c++ProbabiliticFusion::Fuse(multi_sensor_objs, *fused_objs)&#123; sensor_mutex.lock(); foreach sensor_objs in multi_sensor_objs: &#123; sensor_manager_-&gt;AddSensorMeasurements(sensor_objs); &#125; sensor_manager_-&gt;GetLatestFrames(fusion_time, &amp;frames);&#125; sensor_mutex.unlock();sensor_mutex.lock();foreach frame in frames: FuseFrame(frame)CollectFusedObjects(fusion_time, fused_objects);sensor_mutex.unlock(); the data structure for sensor_type, timestamp and perceptioned obj is: std::map&lt;string sensorType, map&lt;int64, SensorObjects&gt; &gt; sensors_; sensors_[sensor_id][timestamp] return the sensor’s object at a special timestamp for Lidar and Radar fusion, we get a frame list, each frame has the object list from both Lidar and Radar. basically, so far it is just matching Lidar objects to Radar objects, assiged to timestamp. Lidar objects and Radar objects are indepenent. predictionin Apollo, prediction module anticipatd the future motion trajectories of the perceived obstacles. prediction subscribe to localization, planning and perception obstacle messages. when a localization update is received, the prediction module update its internal status, the actual prediction is triggered when perception sends out its perception obstacle messages, as following code : 123MessageProcess:OnLocalization(*ptr_localization_msg);MessageProcess:OnPlanning(*ptr_trajectory_msg)MessageProcess:OnPerception(*ptr_perception_msg, &amp;prediction_obstacles) take a detail review of OnPerception as following: 12345678910MessageProcess:OnPerception(perception_obstacles, prediction_obstacles)&#123; ptr_obstalces_container = ContainerManager::Instance()-&gt;GetContainer&lt;ObstaclesContainer&gt;(); ptr_trajectory_container = ContainerManager::Instance()-&gt;GetContainer&lt;ADCTrajectoryContainer&gt;(); EvaluatorManager::Instance()-&gt;Run(ptr_obstacles_container); PredictorManager::Instance()-&gt;Run(perception_obstacles, ptr_trajectory_container, ptr_obstacles_container) ContainerManagerused to instancialize an instance of the special container, which is used to store the special type of message. there are three types of contianer: ObstaclesContainer used to store obstalce info and its lane info. obstacle info coming from perception; its lane info coming from hd map and LaneGraph, LaneSequence and LaneSegment. check the explanation of ObstacleContainer LaneGraph is namely the lane network. e.g. the pre/post lane, neighbor lane, or where the lane disappear. LaneSegment is the element/piece of discretization of the lane network, which has a start point and end point. LaneSequence is a set of all possible next lane choices from current lane, which is only based on the lane network and current obstacle velocity, rather than from planning. PoseContainer used to store vehicle world coord and the coord transfer matrix, velocity info e.t.c ADCTrajectoryContainer EvaluatorManagerwhat evaluator does, is to compute the probability of each laneSequence, the obstacle is either vehicle, or pedestrain or bicycle. to compute probability is with multi-layer predictor(MLP) model: 123456MLPEvaluator::Evaluate(obstacle_ptr)&#123; foreach lane_sequence_ptr in lane_graph-&gt;lane_sequence_set: ExtractFeatureValues(obstacle_ptr, lane_sequence_ptr, &amp;feaure_values) probability = ComputeProbability(feature_values); &#125; ObstacleFeature there are 22 dimensions, which are greate resources to understand a good prediction model in ADS. these features includes, dimension, velocity, acc, direction of the obstacle, and relative position, angle to boundary lane e.t.c. LaneFeature since the laneSequence is already discritized as LaneSegment, for each lane Segment, Apollo calculate 4 features: the angle from lanePoint position to its direction vector; obstacle’s projected distance to laneSequence; the direction of lanePoint; the direction angle from lanePoint’s direction to obstacle’s direction. for each LaneSequnce only calculate its first 40 features, namely the first 10 LaneSegment. from 22 Obstacle features and 40 lane features, compute the probability of each LaneSequence. PredictorManagerthis is where we answer the predict question, where in the next 5 sec the obstacle located. after EvaluatorManager(), we get the probability of each LaneSequence. then choose the few LaneSequences with high probability, and predict a possible obstacle motion for each high-probability LaneSequence as well as with ADCTrajectory info from planning module. 123456789101112131415161718192021222324252627PredictorManager::Run(perception_obstalces)&#123; foreach obstacle in perception_obstacles: predictor-&gt;Predict(obstacle) foreach trajectory in predictor-&gt;trajectories(): prediction_obstacle.add_trajectory(trajectory) prediction_obstacles.add_prediction_obstalce(prediction_obstalce)&#125;``` basically for each obstacle, we do [prediction its next few seconds position](https://github.com/YannZyl/Apollo-Note/blob/master/docs/prediction/predictor_manager.md). there are a few predictors: MoveSequencePredictor, LaneSequencePredictor, FreeMovePredictor.```c++MoveSequencePredictor::Predict(obstacle)&#123; feature = obstalce-&gt;latest_feature(); num_lane_sequence = feature.lane_graph().lane_sequence_size() FilterLaneSequence(feature, lane_id, &amp;enable_lane_sequence) foreach sequence in feature.lane_sequence_set &#123; DrawMoveSequenceTrajectoryPoints(*obstacle, sequence, &amp;points) trajectory = GenerateTrajectory(points) trajectory.set_probability(sequence.probability()); trajectories_.push_back(trajectory) &#125;&#125; why need prediction modulewhether including motion prediction or not is based on the computing ability of ADS system. for low power system, the planning module update really high frequency, then there is no prediction, or prediction only need be considered in less than a few mic-sec; but for complex ADS system with low frequency update of planning, the ability to forsee the next few secs is very helpful. Planning!image VehicleStateProvider12345678class VehicleStateProvider &#123; common::VehicleState vehicle_state_ ; localization::LocalizationEstimate original_localization_ ; Update(); EstimateFuturePosition();&#125; P&amp;C Mapupdate routing responseduring PnC map, we need to query waypoints(in s, l coord), e.g. current vehicle position, target position, in which routing laneSegment. get laneSegment info from routing response 12345678PncMap::UpdateRoutingResponse(routing)&#123;foreach road_segment in routing.road() : foreach passage in road_segment.passage(): foreach lane in passage.segment(): &#123; all_lane_id.insert(lane.id()); &#125;&#125; query waypoints 123RouteSegments::WithinLaneSegment(&amp;lane_segment, &amp;waypoint)&#123; return lane_segment.lane-&gt;id().id() == waypoint.id() &amp;&amp; waypoint.s() &gt;= lane_segment.start_s &amp;&amp; waypoint.s() &lt;= lane_segment.end_s&#125; with UpdateRoutingResponse(), basically trafer waypoint in s,l coord to a more readable representation: [route_index, roadSegid, PassageID, laneSegmentID] generate RouteSegmentsbased on current vehicle status and routing response, get all driving-avaiable path set, each RouteSegment is one driving-avaiable path in a short period. The length of each RouteSegment depends on the both backward(30m by default) lookup length and forward lookup length, which depends on ego vehicle velocity, a time threashold(8s by default), min_distance(150m by default), max_distance(250m by default). check here UpdateVehicleState() it’s not about update vehicle velocity e.t.c, but find out vehicle current location adc_waypoint_ in the routing path and the next lane index next_routing_waypoint_index_. 123451. based on current vehicle velocity(x,y) and heading direction, lookup hdmap to find out all possible lanes, where current vehicle is on2. check if any lane from the possible lanes set, belong to any lanes from routingResponse.road(), the output is the filtered lanes set, on which the vehicle is and which belongs to routingResponse. 3. calculate vehicle projection distance to each lane in the filtered lanes set, the lane with min projection distance is the target/goal lane GetNeighborPassages() basically, check the next connected and avaiable channel. for situations, e.g. next cross lane is left turn, or lane disappear 123456789101112131415161718192021222324GetNeighborPassages(road, passage_index)&#123; if (routing::FORWARD) &#123; // keep forward, return current passage &#125; if (source_passage.can_exit()) &#123; // current passage disappear &#125; if (IsWaypointOnSegment(next_routing_waypoint)) &#123; // next routing waypoint is in current lane &#125; if(routing::LEFT or routing::RIGHT)&#123; neighbor_lanes = get_all_Left/Right_neighbor_forward_lane_id() foreach passage in road: foreach segment in passage: if(neighbor_lanes.count(segment.id())&#123; result.emplace_back(id); break; &#125; return result; &#125; GetRouteSegments() once we get the neighbor channels/passenges, the last step is to check if these passenges are drivable, only when current lane where ego vehicle located is the same lane or exactly next one lane when the vehicle projected on the passenge, and make sure the same direction. all other situations are not drivable. additionaly add forward and backward segments will give current RouteSegments. tips, passage is the channel where vehicle drive, or lane in physical road. but we use only LaneSegment, there is no keyword Lane. from RouteSegment to road sample pointsRouteSegment is similar as LaneSegments from hdMAP, including laneID, start_s, end_s; with additional mapPathPoint info, e.g. heading direction, and other traffic area property, which is used to lookup HD map. rather than LaneSegments is about 2m, RouteSegment length can be much longer, including a few LaneSegments. if each LaneSegment provides a sample point, and packaging these smaple points as MapPathPoint, then RouteSegments can be represented as a list of MapPathPoint. and in reality, both start_point and end_point of each LaneSegemnt also added as a sample point, but need take off overlap points. 1PnCMap::AppendLaneToPoints() each two mapPathPoint again can group as a LaneSegment2d, and for the LaneSegment2d in same lane, can joining as one LaneSegment: 123456Path::InitLaneSegments()&#123; for(int i=0; i+1&lt;num_points_; ++i)&#123; FindLaneSegment(path_points_[i], path_points_[i+1], &amp;lane_segment); lane_segments_.push_back(lane_segment); LaneSegment.Join(&amp;lane_segments);&#125; each small piece of LaneSegment2d helps to calculate the heading direction. 123456789101112131415161718192021Path::InitPoints()&#123; for(int i=0; i&lt;num_points_; ++i)&#123; heading = path_points_[i+1] - path_points[i]; heading.Normalize(); unit_directions_.push_back(heading); &#125;&#125;``` tips, LaneSegment2d is the small piece about 2m, LaneSegment is a larger segment, and has accumulated_start_s, and accumulated_end_s info.so far, we have LaneSegment and MapPathPoints set to represent the RouteSegment. the MapPointPoint is about 2m each, Apollo(why?) density it about 8 times to get a set of sample path points, which is about 0.25m each.```c++Path::InitWidth()&#123; kSampleDistance = 0.25 ; num_sample_points_ = length_ / kSampleDistance + 1 ; for(int i=0; i&lt;num_sample_points_; ++i)&#123; mapPathPoint_ = GetSmoothPoint(s); s += kSampleDistance ; &#125;&#125; finally, RouteSegment has traffic zone info, e.g. cross road, parking area e.t.c ReferenceLineProviderReferenceLineProvider has two funcs, check here: smoothing sample path_points to get a list of anchor points, which is the exactly vehicle driving waypoints. from path_points to anchor_points is resampling, with a sparse(5m) sampling distance, as well as additionally consider one-side driving correction factor. taking an example about one-side driving correction factor,: when driving on left curve, human driver keeps a little bit left, rather than keeping in the centerline. piecewise smoothing from anchor points the basic idea is split anchor points in n subgroups, and polyfit each subgroup; for the subgroup connection part need to make sure the zero-order, first order and second order differential smooth. which in final return to an optimization problem with constraints. Framethe previous PnCMap and ReferenceLine is in an ideal driving env, Frame class considers obstacles behavior, and traffic signs. check here obstacle lagPredictionpredictionObstacle info is filtered by the following two cases, before use as the obstacle trajectory for planning. for latest prediction obstacles only if perception_confidence is large than confidence_threshold(0.5 by default) and the distance from the prediction obstacle to ego vehicle is less than distance_threshold(30m by default), then this obstacle is considered for history prediction obstacles only if the prediction_obstacles has more than 3 obstacles. as well as each obstacle appear more than min_appear_num(3 by default), and for the same obstacle, the timestamp distance from its previous prediction to latest prediction is not longger than max_disappear_num(5 by default), then this obstcle need take considerion. relative position from ego vehicle to obstacleas we get the obstacles’ LagPrediction trajectory, as well as ReferenceLine for ego vehicle. now we combine this two information, to understand when the ego is safe and how far ego can drive forward. the output is the referenceline with each obstacle overlapped info. including the time low_t when overlap begins, and the time high_t when overlap ends; and the start location low_s-start and end location high_s-start of the overlap. rule based behavior to handle overlap areathere are 11 rule based behavior. backside_vehicle change_lane crosswalk destination front_vehicle keep_clear pull_over reference_line_end rerouting signal_light stop_sign Planning::RunOnce() { foreach ref_line_info in frame_-&gt;reference_line_info()){ traffic_decider.Init(); traffic_decider.Execute(&amp;ref_line_info); } }]]></content>
      <tags>
        <tag>apollo</tag>
      </tags>
  </entry>
  <entry>
    <title><![CDATA[ads ros and simulator in one docker]]></title>
    <url>%2F2020%2F01%2F15%2Fads-ros-and-simulator-in-one-docker%2F</url>
    <content type="text"><![CDATA[backroundpreviously, we had test to run ads ros in one docker container, and lgsvl simulator in another docker container. once they are hosted in the same host machine, the ADS ros nodes and lgsvl can communicate well. based on this work, now it’s the time to integrate ADS ros nodes into the lgsvl docker. the basic idea of how to combine multi docker images into one, is use multistage-build, which I call “image level integration”. image level integrationbasically we have both ads_ros image and lgsvlsimulator image already, and there are a few components from ads_ros can be imported to lgsvlsimulator container: 123456FROM ads_ros:latest AS ADSFROM lgsvlsimulator:latestRUN mkdir /catkin_wsCOPY --from=ADS /root/catkin_ws /catkin_wsCOPY --from=ADS /opt/ros /opt/ros CMD [&quot;/bin/bash&quot;] the problem of image level integration, it actually miss some system level components: /etc/apt/sources.list.d/ros-latest.list, which then can’t update ros modules; other components, e.g. which are installed during building ads_ros image by apt-get install, which are go the system lib path, which of course can distinct out, and copy to lgsvlsimulator, but which is no doubt tedious and easy to miss some components. component level integrationas ads_ros is really indepent to lgsvlsimulator, so another way is use lgsvlsimulator as base image, then add/build ros component and ads_ros compnents inside. FROM ros:kinetic AS ROS # http://wiki.ros.org/kinetic/Installation/Ubuntu FROM lgsvlsimulator:latest RUN mkdir -p /catkin_ws/src COPY --from=ROS /opt/ros /opt/ros COPY --from=ROS /etc/apt/sources.list.d/ros1-latest.list /etc/apt/sources.list.d/ros1-latest.list # ADD ros key RUN apt-key adv --keyserver &apos;hkp://keyserver.ubuntu.com:80&apos; --recv-key C1CF6E31E6BADE8868B172B4F42ED6FBAB17C654 ## -------- install ads ros packages in lgsvlsimulator container ------------ ## ENV CATKIN_WS=/catkin_ws ENV ROS_DISTRO=kinetic ## https://docs.ros.org/api/catkin/html/user_guide/installation.html RUN APT_INSTALL=&quot;apt-get install -y --no-install-recommends&quot; &amp;&amp; \ apt-get update &amp;&amp; \ DEBIAN_FRONTEND=noninteractive $APT_INSTALL \ build-essential \ apt-utils \ ca-certificates \ psmisc \ cmake \ python-catkin-pkg \ ros-${ROS_DISTRO}-catkin \ ros-${ROS_DISTRO}-tf \ ros-${ROS_DISTRO}-turtlesim \ ros-${ROS_DISTRO}-rosbridge-suite \ iputils-ping \ net-tools # RUN source ~/.bashrc # copy ads ros into ws COPY /ads_ros/src $CATKIN_WS/src ### build msgs RUN /bin/bash -c &apos;. /opt/ros/${ROS_DISTRO}/setup.bash; cd ${CATKIN_WS}; catkin_make --pkg pcl_msgs autoware_msgs nmea_msgs &apos; ### build ros nodes RUN /bin/bash -c &apos;. /opt/ros/${ROS_DISTRO}/setup.bash; cd ${CATKIN_WS}; catkin_make &apos; # copy ros scripts COPY /ads_ros/script_docker $CATKIN_WS/script ###--------finished ads ros package -------------- ### CMD [&quot;/bin/bash&quot;] runtime issuewith the dockerfile above, we can build the docker image which include both lgsvl and ads ros. one runtime issue is due to lgsvl scenario is run with python3, while our ads ros, especially ros_bridge_launch is based on python2. so need some trick to add python2 at $PATH before python3 when launch ros_bridge, then exchange back.]]></content>
      <tags>
        <tag>lgsvl</tag>
        <tag>ros</tag>
        <tag>Docker</tag>
      </tags>
  </entry>
  <entry>
    <title><![CDATA[ads ros and lgsvl talk in dockers]]></title>
    <url>%2F2020%2F01%2F14%2Fads-ros-and-lgsvl-talk-in-dockers%2F</url>
    <content type="text"><![CDATA[backgoundpreviously, we had integrated ads ros nodes into one launch file. now we try to put ads nodes into docker, and talk to lgsvl simulator in another docker. run dockerimage12docker build -t ads_ros . docker run -it ads_ros /bin/bash unable to run cakin_make in dockerfile1234567891011121314151617181920212223242526272829303132333435363738394041424344RUN /bin/bash -c &apos;. /opt/ros/kinetic/setup.bash; cd &lt;into the desired folder e.g. ~/catkin_ws/src&gt;; catkin_make&apos;``` #### dockerfile for ads ros``` FROM ros:kinetic # create local catkin workspace ENV CATKIN_WS=/root/catkin_wsENV ROS_DISTRO=kineticRUN mkdir -p $CATKIN_WS/src## install catkin_make ## https://docs.ros.org/api/catkin/html/user_guide/installation.htmlRUN APT_INSTALL=&quot;apt-get install -y --no-install-recommends&quot; &amp;&amp; \ apt-get update &amp;&amp; \ DEBIAN_FRONTEND=noninteractive $APT_INSTALL \ build-essential \ apt-utils \ ca-certificates \ psmisc \ cmake \ vim \ python-catkin-pkg \ ros-$&#123;ROS_DISTRO&#125;-catkin \ ros-$&#123;ROS_DISTRO&#125;-tf \ ros-$&#123;ROS_DISTRO&#125;-turtlesim \ ros-$&#123;CATKIN_WS&#125;-rosbridge-suite iputils-ping \ net-tools ### add third-party headers# RUN source ~/.bashrc # copy ads ros into wsCOPY /src $CATKIN_WS/src### build msgs RUN /bin/bash -c &apos;. /opt/ros/$&#123;ROS_DISTRO&#125;/setup.bash; cd $&#123;CATKIN_WS&#125;; catkin_make --pkg pcl_msgs pb_msgs autoware_msgs mobileye_msgs ibeo_msgs nmea_msgs &apos; ### build ros nodes RUN /bin/bash -c &apos;. /opt/ros/$&#123;ROS_DISTRO&#125;/setup.bash; cd $&#123;CATKIN_WS&#125;; catkin_make &apos;# copy ros scriptsCOPY /script $CATKIN_WS/scripts # run ros shellWORKDIR $&#123;CATKIN_WS&#125;/scripts ros envsset ROS_IP on all involved containers to their IP, and set ROS_MASTER_URI to the IP of the roscore container. That would avoid the DNS problem. understand ros environment variables $ROS_ROOT : set the location whre the ROS core packages are installed $ROS_MASTER_URI : a required setting that tells nodes where they can locate the master $ROS_IP or $ROS_HOSTNAME : sets the declared network address of a ROS node get docker container’s IPdocker inspect -f &quot;{{ .NetworkSettings.Networks..IPAddress }}&quot; &lt;container_name||container_id&gt; tips, network_name: e.g. host, bridge, ingress e.t.c. with docker host net, then the container doesn’t have its own IP address allocated, but the application is available on the host’s IP address with customized port. run roscore in docker and talk to rosnodes at host start roscore from docker container as following : sudo docker run -it –net host ads_ros /bin/bash roscore start other ros nodes at host 1234567891011121314151617181920212223rosnode list ## &gt;&gt;&gt; /rosout source $ROS_PACKAGE/setup.shrosrun rtk_sensor rtk_sensor ### run successfully ``` how to understand ? once the docker container start with `host network`, the `roscore` run inside docker, is same as run in the host machine !! #### ads ros in docker talk to lgsvl in another docker with HOST network * once we start ads ros docker as following: ```shellsudo docker run -it --net host ads_ros /bin/bashroscore ``` * start lgsvl in docker: ```shell#! /usr/bin/env bashxhost + sudo nvidia-docker run -it -p 8080:8080 -e DISPLAY=unix$DISPLAY --net host -v /tmp/.X11-unix:/tmp/.X11-unix lgsvlsimulator /bin/bash then access webUI in host machine, and add host: 10.20.181.132 in Clusters page, and add 10.20.181.132:9090 for Selected Vehicles. as lgsvl is also in host network. so these two docker can communicate through ros well !! ads ros container talk to lgsvl container with ROS_IPsince lgsvl will run in docker swarm env, we can’t depend on host network, which requires ROS_IP env. the following test is in one host machine. in host terminal 123456789101112131415161718192021222324252627282930313233343536export ROS_MASTER_URI=http://192.168.0.10:11311export ROS_HOSTNAME=192.168.0.10export ROS_IP=192.168.0.10 roscore ``` * in ads ros docker ```shellsudo docker run -it \ --env ROS_MASTER_URI=http://10.20.181.132:11311 \ --env ROS_IP=10.20.181.132 \ ads_ros /bin/bashrosnod list ``` however, when start `roscore` in docker, it reports:```shellUnable to contact my own server at [http://10.20.181.132:33818/].This usually means that the network is not configured properly.A common cause is that the machine cannot ping itself. Please checkfor errors by running: ping 10.20.181.132``` if checking the IP address inside the docker container by `ifconfig`, which reports `172.17.0.3`, which then make sense that the container can't talk to `10.20.181.132`, which means we can't assign a special IP address for a docker container. so reset in the docker container as:```shellexport ROS_MASTER_URI=http://172.17.0.3:11311export ROS_HOSTNAME=172.17.0.3 actually, the host terminal can talk to the ads ros container directly, with no need to set $ROS_HOSTNAME &amp; $ROS_MASTER_URI specially; as well as another docker container in this host machine, e.g. lgsvl. a little bit knowledge about docker network. so each docker container does have an virtual IP, e.g. 172.17.0.1. while if run the docker image with host network, there is no special container IP, but the container directly share the IP of the host machine. as multi docker containers run in the same host machine, even without host network, they are in the same network range, so they can communicate to each other. additionaly for ros_master, which may requires to add $ROS_HOSTNAME &amp; $ROS_MASTER_URI. start lgsvl in another docker 123456789#! /usr/bin/env bashxhost + sudo nvidia-docker run -it \ -p 8080:8080 \ -e DISPLAY=unix$DISPLAY \ -v /tmp/.X11-unix:/tmp/.X11-unix \ --env ROS_MASTER_URI=http://172.17.0.3:11311 \ --env ROS_HOSTNAME=172.17.0.3 lgsvlsimulator /bin/bash in summaryso far, we have host ads ros in one docker, and lgsvl in another docker, and they are in the same machine, and they can talk to each other. the next thing is to put ads ros and lgsvl in one image. referrosdep docker 1.10 container’s IP in LAN listening to ROS messages in docker containers exposing ROS containers to host machine why you need IP address of Docker container catkin_make not found in dockerfile]]></content>
      <tags>
        <tag>lgsvl</tag>
        <tag>ros</tag>
        <tag>Docker</tag>
      </tags>
  </entry>
  <entry>
    <title><![CDATA[ROS ADS integrated to python talk to lgsvl]]></title>
    <url>%2F2020%2F01%2F10%2FROS-ADS-integrated-to-python-talk-to-lgsvl%2F</url>
    <content type="text"><![CDATA[backgroundpreviously, we had done ADS ros package directly talk with lgsvl. the step further is to trigger ADS ros nodes from scenario python script a few requirements: for ADS ros should be integrated with each scenario python script, which means, when the python script finished, the ADS ros should exit the ads ros package shell script is independ of python script execute shell in Python to call a shell command directly with os.system(cmd) or 1subprocess.call(&quot;ls -lt&quot;, shell=True) to run shell script with a python subprocess 1proc=subprocess.Popen([&quot;&quot;], stdout=subprocess.PIPE) subprocess.Popen() return the process object a few helpful commands to debug rosrun: 1234567ps -aux | grep &quot;roscore&quot; ps -aux | grep &quot;rosmaster&quot;killall -9 roscore killall -9 rosmaster ros1 to python3there is an real issue, due to the ADS ros package is mostly implemented by ROS1, maintained by the algorithm team. but lgsvl scenario is running with python3. ROS1 matches Python2. so there comes the solution, either to upgrade the ADS ros pakcage to ROS2, or find a way to run ROS1 in python3 env. conda base envwe had decided to adapt ROS1 in python3 env. as the host machine has conda env, first need to disable conda base. 1conda config --set auto_activate_base false as lgsvl scenario is runned with conda python3 env, inside which used subprocess to run ADS ros shell, in which creates a few new gnome-terminals, which are non log-in terminal, and the trick things here: even though disabled auto activate base, and the terminal has no header (base), but when check the python path, it still points to the conda/bin/python, which will fail rosbridge_launch.server, which is a pure ros1 and python2 module. so need check if conda --version in the ads ros shell script is in current terminal, if does, run conda deactivate, which gives error: 1CommandNotFoundError: Your shell has not been properly configured to use &apos;conda deactivate&apos;. it looks there is some mess up, with init conda.sh is in ~/.bashrc, but during the new terminal ceated, it doens’t confgiure all right. which can be fixed : 123456conda --version if [ $? == 0 ]then source ~/anaconda3/etc/profile.d/conda.sh conda deactivatefi in this way, we run lgsvl scenario in python3, as well as can new with python2 terminals to run ads ros nodes from this python3 terminal ros scripts path is not python pathas we try to separate python scripts from ads ros nodes, so need some global env variable for the ros scripts path. kill all ros nodes gracefullywe need restart/shutdown the ads ros nodes package at each time when python scenario script start/stop, which has two steps: to shutdown ros nodes, e.g. rosnode kill to close all the gnome-terminals to run the ros nodes the second step is very tricky, need some understand. A terminal is a file. Like /dev/tty. Files do not have a process id. The process that “owns” the terminal is usually called the controlling process, or more correctly, the process group leader. gnome-terminal runs a single pid, it creates a child process for each and every window and/or tab. and can retrieve these child processes by the command: $ pgrep -P &lt;pid_of_gnome-terminal&gt; Many terminals seem to mask themselves as xterm-compatible, which is reported by echo $TERM or echo $COLORTERM. 123$! is the PID of the last backgrounded process.kill -0 $PID checks whether it&apos;s still running.$$ is the PID of the current shell. #! /bin/bash gnome_pid=`pgrep gnome-terminal` subpids=`pgrep -P ${gnome_pid}` if [ ! $1 ]; then echo &quot;missing the gnome-terminal pid, exit&quot; exit -1 fi #shutdown the ros nodes rosnode kill -a &gt; /dev/null 2&gt;&amp;1 killall -9 rosmaster &gt; /dev/null 2&gt;&amp;1 #shutdown the terminals where ros nodes hosted for pid in $subpids do if [[ $pid != $1 ]] ; then echo $pid kill $pid fi done tips: the implement has a litle problem when integrate with lgsvl when the simulation time is short, e.g. 1sec, then ros_clear.sh can’t catch the opened gnome-terminals from ros_start.sh. find the process name using PID$ ps aux | grep PID $ ps -p PID -o format $ curpid=`ps -p $$ -o ppid=` the output of ps aux –&gt; in the foreground process groups –&gt; is a session leader summaryso far, we integrated ads ros packages into python, which then talk to lgsvl. referpythn subprocess from Jianshu understand linux process group pidof command which_term get the info of a PID get the pid of running terminal]]></content>
      <tags>
        <tag>lgsvl</tag>
        <tag>ros</tag>
        <tag>python</tag>
      </tags>
  </entry>
  <entry>
    <title><![CDATA[ADS stack in ros talk to lgsvl]]></title>
    <url>%2F2020%2F01%2F04%2FADS-stack-in-ros-talk-to-lgsvl%2F</url>
    <content type="text"><![CDATA[learning is about repeat 100 times !this maybe the third time to go through ROS, which still looks fresh, but does give a whole picture about how ROS works in ADS dev. what is ros topic and message the topic is the channel where nodes are subscribed for to read messages or where the nodes publish those messages; the message is the data itself, previously defined. 1234567891011rostopic echo [topic]rostopic list rosmsg show [message]rosrun [package_name] [node_name]rospack find [package_name]rosnode info [node_name] what is a publisher/subscriber a publisher (node) publishes messages to a particular topic. publish() is asynchronous, and only does work if there are subscribers connected on that topic. publish() itself is very fast, and it does as little work as possible: 12serialize the message to a buffer push the buffer onto a queue for later processing 1ros::Publisher advertise(const std::string&amp; topic, uint32_t queue_size, bool latch = false); the queue size is defined in publihser/outgoing message queue. if publishing faster than roscpp can send the message over the wire, roscpp will start dropping OLD messages. 1ros::Subscriber subscribe(const std::string&amp; topic, uint32_t queue_size, &lt;callback, which may involve multiple arguments&gt;, const ros::TransportHints&amp; transport_hints = ros::TransportHints()); the queue_size is the incoming message/subscriber queue size, roscpp will use for your callback. if messages are arriving too fast and you are unable to keep up, roscpp will start throwing away OLD messages. in summary: * publish() is asynchronous. * When you publish, messages are pushed into a queue (A) for later processing. This queue is immediately pushed into the outgoing/publisher queue (B) . PS: If no one subscribes to the topic, the end is here. * When a subscriber subscribes to that topic. Messages will be sent/pushed from the corresponding outgoing/publisher queue (B) to the incoming/subscriber queue (C).--&gt; this is done by internal thread * When you spin/ callback, the messages handled are from the incoming/subscriber queue (C). messages msg files are simple text files for specifying the data structure of a message. These files are stored in the msg subdirectory of a package. the publisher and subscriber must send and receive the same type/topic of message 1&#123;&quot;op&quot;: &quot;publish&quot;, &quot;topic&quot;: &quot;/talker&quot;, &quot;msg&quot;: &#123;&quot;data&quot; : &quot;_my_message&quot; &#125;&#125; lgsvl defines the rosBridge class, which has a Reader() and Writer(), corresponding to Subsciber and Publisher, respectfually. rosBridge in lgsvlrosbridge is an adapter for non-ros apps to talk with ros. it’s common to package ADS software as ros node during dev stage, and rosbridge is the common way to integrate simulator with ADS software. the base implementation of rosBridge in lgsvl is as following: RosBridge.cs12ConcurrentQueue&lt;Action&gt; QueuedActions ;Dictionary&lt;string, Tuple&lt;Func&lt;JSONNode, object&gt;, List&lt;Action&lt;object&gt;&gt;&gt;&gt; Readers ; the topic is packaged as: 1234&#123; &quot;op&quot;: &quot;subscribe or publish or call_service or service_response or set_level&quot;, &quot;topic&quot;: &#123;&#125;, &quot;type&quot;: &#123;&#125;&#125; AddReader(topic, callback) a few types/topices supported: Deteced3DObjectArray, Deteced2DObjectArray, VehicleControlData, Autoware.VehicleCmd, TwistStamped, Apollo.control_command which is the list of ros messages that lgsvl can parsing. 1234567if(!Readers.ContainsKey(topic))&#123; Readers.Add(topic, Tuple.Create&lt;Func&lt;JSONNode, object&gt;, List&lt;Action&lt;object&gt;&gt;&gt;(msg=&gt;converter(msg), new List&lt;Action&lt;object&gt;&gt;()) ) ;&#125; Readers[topic].Item2.Add(msg=&gt;callback(T)msg)); AddReader is the subscriber nodes in lgsvl. in Sensor group, there are three sensors do AddReader: GroundTruth3DVisualizer VehicleControlSensor GroudTruth2DVisualizer which means, lgsvl server can read these three typies of messsages. if there is no rendering/visual needs, only vehicle conroller (acc, brake) message is required for lgsvl. AddWriter(topic) the types/topic supported are: ImageData, PointCloudData, Detected3DObjectData, Detected2DObjectData, SignalDataArray, DetectedRadarObjectData, CanBusData, GpsData, ImuData, CorrectedImuData, GpsOdometryData, ClockData AddWriter() is a writer adapter, which returns a special type/topic writer/publisher. AddWriter is the publisher nodes in lgsvl, in Sensor group, the following sensors can publish message out: LidarSensor SignalSensor GpsInsSensor GpsOdometrySensor DepthCameraSensor ImuSensor SemanticCameraSensor GroudTruth2DSensor CanBusSensor RadarSensor GroudTruth3DSensor ClockSensor GpsSensor ColorCameraSensor AddService(topic, callback) OnMessage(sender, args) 1234567891011if (args.op==&quot;publish&quot;)&#123; topic = json[&quot;topic&quot;] Readers.TryGetValue(topic, out readerPair) var parse = readerPair.Item1 ; var readers = readerPair.Item2; var msg = parse(json[&quot;msg&quot;]); foreach (var reader in readers) &#123; QueuedActions.Enqueue(()=&gt;reader(msg)); // &#125; ros subscriber uses a callback mechanism, when the message is coming, all readers who subscribe this topic will read in this message. RosWriter.csthe message output is in the format as: 12345&#123; &quot;op&quot; : &quot;publish&quot; , &quot;topic&quot; : Topic, &quot;msg&quot;: message &#125; websocket in ros bridgethe implementation of ros bridge in lgsvl is by WebSocket, which maintained a continously communication pipeline, for external ros nodes publishers. ros ads talk to lgsvl rosbridgefrom the previous section, lgsvl talk to external ros nodes (which is the ADS stack) through rosbridge, and which needs external inputs, e.g. vehicle control command, 2d/3d ground truth visualizer. and the ADS stack ros nodes can subscribes Gps, canbus, Lidar, Radar, GroudTruth, SemanticCamera, DeepCamera topics through lgsvl rosbrige. so the pipeline are simple as following: lgsvl rosbridge --&gt; {gps, radar, camera, lidar, groudTruth message} --&gt; ADS stack nodes ADS stack nodes --&gt; {vehicle control command message} --&gt; lgsvl rosbridge usually, the ADS stack nodes are a few related ros nodes, including RTK/GPS sensor, Lidar/Camera/Radar sensors, hdmap node, data fusion node e.t.c. run ROS ADS stackthe following nodes are used commonly in ADS dev: gps_node subscribe: /gps, /odom e.t.c publish: /sensor/data topics camera_node subscribe: /camera/raw_data publish: /objects_list, /lane_object e.t.c radar_node subscribe: /radar/raw_data publish: /obstacle/velocity , /obstacle/distance hdmap_node subcribe: /rtk/data, /ins/data. publish: /lane/info, /speed_limit e.t.c. fusion_node subscribe topics: /canbus, rtk/data, /ifv_lane, /radar/blind, /radar/corner, /esr, /velodey, /lane/info e.t.c publish topics: /object_list/, /road/info, /vehicle/status and visual related messages e.t.c. planning_node subscribe topics: /object_list, /road/info, /vehicle/status from fusion_node publish topics: /vehicle/cmd_ctl ADS ros launchthe previous section has a basic idea about what kind of ROS nodes usually needs to run the ADS stack. In road test or simulaiton test, we prefer a simple way to launch all ADS related nodes in one launch. which usually give a simple launch file to start all nodes, or a shell script to start each ros nodes sequencially. system verificationros is a common solution to package ADS stack and integrate with the simulation env during SIL sysem verification, as well as road test data collection. since ROS is the cheap solution for ADS dev and test, compared to CAEape/Vector, Dspace solution. on the other hand, due to the hardware computing limitation, drivers depends, and espeically the ros is mostly run in Linux kind of non-realtime OS, which is not good for domain controller(HAD) test. so during the real test envrionment, we need to consider these true issues. another topic is about how to quick build up a user-friend verification pipeline for, which keeps as little modification as possible, but can run in both simulation verification and physical verification. referros overview understand rosbridge: simple ROS UI roslibjs]]></content>
      <tags>
        <tag>lgsvl</tag>
        <tag>ros</tag>
        <tag>self-driving</tag>
      </tags>
  </entry>
  <entry>
    <title><![CDATA[mobile vehicle interface]]></title>
    <url>%2F2019%2F12%2F29%2Fmobile-vehicle-interface%2F</url>
    <content type="text"><![CDATA[backgroundhuman-machine interface(HMI), in-vehicle device(IVD). human-vehicle interface(HVI), e.g. voice control, touch screen, face detection, e.t.c, which does well for young and passion generation, who have great voice, sharp fingers, and good-look face, and all these improving user experienced tech are based on similar AI. what about the elders ? which should be, in the next 10 - 20 years, the most tough family and society issues in China, and as well as for Japan right now. the basicaly option here is HVI is not enough for the elders groups, or at least is not the only choice for elders, who don’t have great voice, good-look face, or sharp fingers, and even no patient to talk/look/touch with the weak AI system. a very good alternative or additional supportting solution should be mobility vehicle interface(MVI) mobility vehicle interfacewhat kind of mobility devicemobility is a general name, for all kinds of mobile devices, especially for Iphone, Android smart phone and personal care robot. application scenariosfor an elder, who has daily needs for traffic transfer. e.g. go to hospital, to supermarket, to a special restuarant or a park for dinner or sit down with some old friends. the elders are slow in movement and talk, and the current AI based human-vehicle-interface(HVI) definitely make the elders feel pressure. a better or alternative solution is let the mobility device to talk to the vehicle, through an interface mobility vehicle interfacethe mobility vehicle interface(MVI) can be based on existing vehicle OS and mobility OS. there are plenty existing in-vehicle OS, e.g. CarPlayer, Baidu OS, GENIV, QNX e.t.c; and on mobility OS side, the most common are iOS and Andriod. most in-vehicle OS can run the same apps on mobility OS, e.g. google navigation map, instant messages, music, emergency call service e.t.c. so the first solution is app2app. basically the same app ran in the personal mobility device(PMD) talk to the same app run in vehicle OS. PMD has more time with the owner, so has more personality than vehicle, especially as vehicle in future is more like a public service, rather than a personal asset. that’s why mobility vehicle interface(MVI) is a good option, especially for elders, who may not enjoy talk to AI. beyond the easy to implement at this moment, app2app solution has a few limitations: the security is mainly provided by the app supplier, which is not a unit solution, as different app suppliers have different security mechanism. as apps hosted in this system is growing, the adapters or interfaces to make the bridge grows too, which decrease the user experince and increase the system cost. so a better solution is a new mobility-vehicle interface protocol, which is the only bridge between personal mobility and vehicles. and no matter what kind of apps and how many apps hosts in both system, won’t be a burden for the sytem anymore. moblity vehicle interface protocol]]></content>
  </entry>
  <entry>
    <title><![CDATA[can MaaS survive in China]]></title>
    <url>%2F2019%2F12%2F29%2Fcan-MaaS-survive-in-China%2F</url>
    <content type="text"><![CDATA[the difference of US and China: citizenship vs relationshipthe first class cities, e.g. Beijing, Shanghai, Shenzhen, are not different from Chicago, New York, in normal people’s lifestyle: they share the same luxurious brands, Starbucks, Texas beaf Steak, city public services, and the same international popular elements in eletronic consumers, clothing, vehicles, and even the office env. However, down to the third class, or forth class cities in China and US, there are a huge difference. the bottom difference is citizenship（公民意识） vs relationship（关系文化). in US and most developed countries, citizenship is a common sense, no matter in small towns or big cities. in China, the residents in big cities are similar to residents in developed countries; but the normal people in third cities value more about relationship, rather than citizenship, so basically the rules how to live a high-qualitied/successful life in these cities is not universal, which means if a resident from big cities jumping to these small cities, his experince about what is a good career/life choice has to be changed, and further which has a great influence about consuming habit and the acceptance of emerging market. the Chinese goverment is pushing urbanization in most uncitizenlized areas, hopefully this process can be achieved in a few generations, which can be affected by both goverment policy and the external forces, e.g. trade war. any way, there is no short way. Chinese subside marketsin China, one kind of the most profit business is e-trade, e.g. Alibaba, JD, Pinduoduo, e.t.c. they are sinking to the third/forth cities in China in recent years, which is a special phenomenon in China, the reason as I see, is due to the division between citizenship in top class cities and relationship in most small cities in China. for most developed countries, e.g. US, the market is so flat that once one product/service is matured in big cities, there is no additional cost to expand to small towns in national wide. But here in China, the market, the society structure, the resident’s consuming habit are not flat due to the division as mentioned previously. so they need different bussniess strategy for product/service in big cities and most small towns. for the emerging market, the investors and service/product providers need input from top consulting teams, e.g. PWC, Deloitte, BCG, but the research paper from these teams try to ignore the value gap in Chinese large cities and small cities. Of course I can understand the consulting strategy, as emerging market is looking for new services in near future, and it should looks promising. taking mobility as a service (MaaS) as an example, from sharing cars to MaaS is likely happened in urban areas in next 10 years, and expand to most areas in west European and NA countries, but in the most small towns of China, it may never happen. MaaS is a promising service if the society and resident’s value are similar (or plat). for developing countries, e.g. China, India, these emerging market wouldn’t be a great success in national wide. start-ups in MaaS mobiag mobilleo invers maymobility vaimoo in Brazil populus.ai staflsystems polysync.io geotab public resources in moblity as a service(MaaS)Mass alliance International parking &amp; mobility institute shared mobility services in Texas DI_Forces of change-the future of mobility PWC_how shared mobility and automation will reolution Princeton_strategies to Advanced automated and connected vehicles: a primer for state and local decision makers Accenture_mobility as a service whitepaper Bosch_HMI Toyota_Mobility ecosystem Volkswagen_E-mobility module Siemens_Intelligent Traffic Systems MaaS in UK the tech liberation front autonomous vehicle technology]]></content>
  </entry>
  <entry>
    <title><![CDATA[configure hadoop in 2-nodes cluster]]></title>
    <url>%2F2019%2F12%2F28%2Fconfigure-hadoop-in-2-nodes-cluster%2F</url>
    <content type="text"><![CDATA[backgroundit’s by accident that I have to jump into data center, where 4 kinds of data need deal with: sensor verification, with huge amount of special raw sensor data AI perception training, with huge amout of fusioned sensor data synthetic scenarios data, which used to resimualtion inter-middle status log data for Planning and Control(P&amp;C) big data tool is a have-to go through for L3+ ADS team, which has already developed in top start-ups, e.g. WeRide and Pony.AI, as well as top OEMs from NA, Europen. Big data, as I understand is at least as same important to business, as to customers. compared to AI, which is more on customer’s experience. and 2B is a trending for Internet+ diserving into traditional industry. anyway, it’s a good try to get some ideas about big data ecosystem. and here is the first step: hadoop prepare jdk and hadoop in single nodeJava sounds like a Windows langage, there are a few apps requied Java in Ubuntu, e.g. osm browser e.t.c., but I can’t tell the difference between jdk and jre, or openjdk vs Oracle. jdk is a dev toolkit, which includes jre and beyond. so when it’s always better to set JAVA_HOME to jdk folder. jdk in ubuntuthere are many different version of jdk, e.g. 8, 9, 11, 13 e.t.c. here is used jdk-11, which can be download from Oracle website, there are two zip files, the src and the other. the pre-compiled zip is enough to Hadoop in Ubuntu. 1234tar xzvf jdk-11.zip cp -r jdk-11 /usr/local/jdk-11 cd /usr/localln -s jdk-11 jdk append JAVA_HOME=/usr/local/jdk &amp;&amp; PATH=$PATH:$JAVA_HOME/bin to ~/.bashrc, and can run test java -version. what need to be careful here, as the current login user may be not fitted for multi-nodes cluster env, so it’s better to create the hadoop group and hduser, and use hduse as the login user in following steps. create hadoop user1234sudo addgroup hadoopsudo adduser --ingroup hadoop hduser sudo - hduser #login as hduser the other thing about hduser, is not in sudo group, which can be added by: curren login user is hduser: 12345678910111213141516171819202122groups # hadoopsu - # but password doesn't correct#login from the default user terminalsudo -i usermod -aG sudo hduser#backto hduser terminalgroups hduser # : hadoop sudo exit su - hduser #re-login as hduser ``` #### install and configure hadoop hadoop installation at Ubuntu is similar to Java, which has src.zip and pre-build.zip, where I directly download the `pre-build.zip`.another thing need take care is the version of hadoop. since `hadoop 2.x` has no `--daemon` option, which will leads error when master node is with `hadoop 3.x`.```shelltar xzvf hadoop-3.2.1.zip cp -r hadoop-3.2.1 /usr/local/hadoop-3.2.1cd /usr/localln -s hadoop-3.2.1 hadoop add HADOOP_HOME=/usr/local/hadoop and PATH=$PATH:$HADOOP_HOME/bin to ~/.bashrc. test with hadoop version hadoop configure is find here there is another issue with JAVA_HOME not found, which I modify the JAVA_HOME variable in $HADOOP_HOME/etc/hadoop/hadoop_env.sh passwordless access among nodes generate SSH key pair 1234on maste node:ssh-keygen -t rsa -b 4096 -C "master"on worker node:ssh-keygen -t rsa -b 4096 -C "worker" the following two steps need do on both machines, so that the local machine can ssh access both to itself and to the remote. enable SSH access to local machine ssh-copy-id hduser@192.168.0.10 copy public key to the remote node ssh-copy-id hduser@192.168.0.13 tips, if changed the default id_rsa name to sth else, doesn’t work. after the changes above, will generates a known_hosts at local machine, and an authorized_keys, which is the public key of the client ssh, at remote machine. test hadoop on master node 12345hduser@ubuntu:/usr/local/hadoop/sbin$ jps128816 SecondaryNameNode128563 DataNode129156 Jps128367 NameNode on worker node: 123hduser@worker:/usr/local/hadoop/logs$ jps985 Jps831 DataNode and test with mapreduce]]></content>
      <tags>
        <tag>hadoop</tag>
      </tags>
  </entry>
  <entry>
    <title><![CDATA[deploy lgsvl in docker swarm 4]]></title>
    <url>%2F2019%2F12%2F25%2Fdeploy-lgsvl-in-docker-swarm-4%2F</url>
    <content type="text"><![CDATA[backgroundpreviously, tried to deploy lgsvl in docker swarm, which is failde due to the conflict of host network to run lgsvl and the routing mesh of swarm, as I thought. http listen on *why used –network=host, is actually not a have-to, the alternative option is to use &quot;*&quot; as Configure.webHost, instead of localhost nor a special IP address, which lead o HttpListener error: 1The requested address is not vaid in this context. then, we can docker run lgsvl without host network limitations. but still, if run by docker service create, it reports failure: Error initiliazing Gtk+. Gtk/UI in Unitywhen starting lgsvl, it pops the resolution window, which is a plugin of Unity Editor, and implemented with gtk, as explained in last section, which leads to the failure to run lgsvl as service in docker swarm. the simple solution is to disable resolution selection in Unity Editor. 1Build Settings --&gt; Player Settings --&gt; Disable Resolution then the popup window is bypassed. ignore publish portI tried to ignore network host and run directly with routing mesh, but it still doesn’t work. then I remember at the previous blog, when run vkcube or glxgears in docker swarm, it actually does use --network host, so it looks the failure of running lgsvl in docker swarm, is not due to network host, but is due to Gtk/gui. as we can bypass the resolution UI, then directly running as following, works as expected: 1sudo docker service create --name lgsvl --generic-resource &quot;gpu=1&quot; --replicas 2 --env DISPLAY --mount src=&quot;X11-unix&quot;,dst=&quot;/tmp/.X11-unix&quot; --network host lgsvl add assets into containeranother update is to bind assets from host into lgsvl image, which is stored as sqlite data.db, which is a necessary, as we bypassed the authentication, and the cluster has no access to external Internet. where is nextin recent two month, had digged into docker swarm to run lgsvl. so far, the main pipeline looks work now, and there are still a lot little fix there. to run AV simulation in cloud, is a necessary way to test and verify L3+ AV algorithms/products. previous ADAS test is more on each individual feature itself, e.g. ACC, AEB .e.t.c, all of which are easy to define a benchmark test case, and engineers can easily define the test scenarios systemetically. But for L3+, the env status space is infinite in theory, there is no benchmark test cases any more, and at best we can do is statiscally test cases, which requires a huge number of test cases, which is where virutal simulation test in cloud make sense. from tech viewpoint, the next thing is how to drive the L3+ dev by these simulation tools. and another intesting is the data infrastructure setup]]></content>
      <tags>
        <tag>lgsvl</tag>
        <tag>Docker</tag>
      </tags>
  </entry>
  <entry>
    <title><![CDATA[web service bypass in lgsvl]]></title>
    <url>%2F2019%2F12%2F15%2Fweb-service-bypass-in-lgsvl%2F</url>
    <content type="text"><![CDATA[backgroundpreviously had talked lg new version code review, where introduced the new server-browser arch, which is focused on the lgsvl server side implementation, which was based on Nancy and sqliteDB; also a simple introduction about reactjs the gap how client send a http request to the server is done by axios. also another issue is how to access asset resource across domain. namely, running the lgsvl service at one host(192.168.0.10), and http request send from another remote host(192.168.0.13). Axiosthe following is an example on how Axios works. 12345678910111213141516171819202122constructor() &#123; this.state = &#123; user: null &#125; &#125;componentDidMount() &#123; axios.get(&apos;https://dog.ceo/api/breeds/image/random&apos;) .then(response =&gt; &#123; console.log(response.data); if(response.status == 200) setState(user, reponse.data) &#125;) .catch(error =&gt; &#123; console.log(error); &#125;);&#125; render() &#123; return ( ) &#125; from React componnet to DOM will call componentDidMount(), inside which axios send a GET request to https://dog.ceo/api/breeds/image/random for a random dog photo. and can also store the response as this component’s state. enactenact is a React project manager, the common usage: 123enact create . # generate project at current dir npm run serve npm run clean enact prject has a configure file, package.json, while can specify the proxy, which is localhost by default. if want to bind to a special IP address, this is the right place to modify. 1234&quot;enact&quot;: &#123; &quot;theme&quot;: &quot;moonstone&quot;, &quot;proxy&quot;: &quot;http://192.168.0.10:5050&quot;&#125;, inside lgsvl/webUI, we need do this proxy configure, to support the across-domain access. Nancy authenticationthis.RequiresAuthentication(), which ensures that an authenticated user is available or it will return HttpStatusCode.Unauthorized. The CurrentUser must not be null and the UserName must not be empty for the user to be considered authenticated. By calling this RequiresAuthentication() method, all requests to this Module must be authenticated. if not authenticated, then the requests will be redirected to http://account.lgsimulator.com. You need to include the types in the Nancy.Security namespace in order for these extension methods to be available from inside your module. this.RequiresAuthentication() is equal to return (this.Context.CurrentUser == null) ? new HtmlResponse(HttpStatusCode.Unauthorized) : null; all modules in lgsvl web server are authenticated by Nancy:RequiresAuthentication(), for test purpose only, we can bypass this function, and pass the account directly: 1234// this.RequiresAuthentication();// return service.List(filter, offset, count, this.Context.CurrentUser.Identity.Name)string currentUsername = &quot;test@abc.com&quot;;return service.List(filter, offset, count, currentUsername) in this way, no matter what’s the account in React client, server always realize the http request is from the user test@abc.com. sqlite dbin Linux, sqlite data.db is stored at ~/.config/unity3d/&lt;company name&gt;/&lt;product name&gt;/data.db in Windows, data.db is stored at C:/users/username/AppData/LocalLow/&lt;company name&gt;/&lt;product name&gt;/data.db it’s interesting when register at lgsvlsimualtor.com, and it actually send the account info back to local db, which give the chance to bypass. debug webUIthe chrome and firefox has react-devtools plugins, which helps, but webUI doesn’t use it directly. to debug webUI it’s even simpler to go to dev mode in browser, and checking the few sections is enough refer bring your data to the front]]></content>
      <tags>
        <tag>lgsvl</tag>
        <tag>react</tag>
      </tags>
  </entry>
  <entry>
    <title><![CDATA[deploy lgsvl in docker swarm 3]]></title>
    <url>%2F2019%2F12%2F15%2Fdeploy-lgsvl-in-docker-swarm-3%2F</url>
    <content type="text"><![CDATA[backgroundpreviously tried to run Vulkan in virtual display, which failed as I understand virtual display configure didn’t fit well with Vulkan. so this solution is direct display to allow each node has plugged monitor(which is called PC cluster). for future in cloud support, current solution won’t work. and earlier, also tried to deploy lgsvl in docker swarm, which so far can work with Vulkan as well, after a little bit understand X11 a few demo test can run as followning: deploy glxgears/OpenGL in PC cluster123export DISPLAY=:0 xhost + sudo docker service create --name glx --generic-resource &quot;gpu=1&quot; --constraint &apos;node.role==manager&apos; --env DISPLAY --mount src=&quot;X11-unix&quot;,dst=&quot;/tmp/.X11-unix&quot; --mount src=&quot;tmp&quot;,dst=&quot;/root/.Xauthority&quot; --network host 192.168.0.10:5000/glxgears deploy vkcube/Vulkan in PC cluster123export DISPLAY=:0 xhost + sudo docker service create --name glx --generic-resource &quot;gpu=1&quot; --constraint &apos;node.role==manager&apos; --env DISPLAY --mount src=&quot;X11-unix&quot;,dst=&quot;/tmp/.X11-unix&quot; --mount src=&quot;tmp&quot;,dst=&quot;/root/.Xauthority&quot; --network host 192.168.0.10:5000/vkcube deploy service with “node.role==worker”123export DISPLAY=:0xhost + sudo docker service create --name glx --generic-resource &quot;gpu=1&quot; --constraint &apos;node.role==worker&apos; --env DISPLAY --mount src=&quot;tmp&quot;,dst=&quot;/root/.Xauthority&quot; --network host 192.168.0.10:5000/glxgears deploy service in whole swarm123xhost + export DISPLAY=:0 sudo docker service create --name glx --generic-resource &quot;gpu=1&quot; --replicas 2 --env DISPLAY --mount src=&quot;tmp&quot;,dst=&quot;/root/.Xauthority&quot; --network host 192.168.0.10:5000/vkcube which deploy vkcube service in both manager and worker node: overall progress: 2 out of 2 tasks 1/2: running [==================================================&gt;] 2/2: running [==================================================&gt;] verify: Service converged understand .Xauthorityas docker service can run with --mount arguments, which give the first try to copy .Xauthority to manager node, but .X11-unix is not copyable, which is not a normal file, but a socket. in docker service create, when create a OpenGL/vulkan service in one remote worker node, and using $DISPLAY=:0, which means the display happens at the remote worker node. so in this way, the remote worker node is played the Xserver role; and since the vulkan service is actually run in the remote worker node, so the remote worker node is Xclient ? assuming the lower implement of docker swarm serviceis based on ssh, then when the manager node start the service, it will build the ssh tunnel to the remote worker node, and with the $DISPLAY variable as null; even if the docker swarm can start the ssh tunnel with -X, which by default, will use the manager node’s $DISPLAY=localhost:10.0 Xauthority cookie is used to grant access to Xserver, so first make sure which machine is the Xserver, then the Xauthority should be included in that Xserer host machine. a few testes: 1234567ssh in worker: echo $DISPLAY --&gt; localhost:10.0xeyes --&gt; display in master monitor ssh in worker: xauth list --&gt; &#123;worker/unix:0 MIT-MAGIC-COOKIE-1 19282b0a651789ed27950801ef6f1441; worker/unix:10 MIT-MAGIC-COOKIE-1 a6cbe81637207bf0c168b3ad20a9267a &#125;in master: xauth list --&gt; &#123; ubuntu/unix:1 MIT-MAGIC-COOKIE-1 ee227cb9465ac073a072b9d263b4954e; ubuntu/unix:0 MIT-MAGIC-COOKIE-1 75893fb66941792235adba22362c4a6f; ubuntu/unix:10 MIT-MAGIC-COOKIE-1 785f20eb0ade772ceffb24eadeede645 &#125; so which cookie is is for this $DISPLAY ? it shouldb be the one on ubuntu/unix:10; 12ssh in worker: export DISPLAY=:0xeyes --&gt; display in worker monitor then it use the cookie: worker/unix:0. deploy lgsvl service in swarm123xhost + export DISPLAY=:0sudo docker service create --name lgsvl --generic-resource &quot;gpu=1&quot; --replicas 2 --env DISPLAY --mount src=&quot;tmp&quot;,dst=&quot;/root/.Xauthority&quot; --network host --publish published=8080,target=8080 192.168.0.10:5000/lgsvl which gives: overall progress: 0 out of 2 tasks 1/2: container cannot be disconnected from host network or connected to host network 2/2: container cannot be disconnected from host network or connected to host network basically, the service is deployed in ingress network by default, but as well, the service is configured with host network. so it conflict. swarm network the routing mesh is the default internal balancer in swarm network; the other choice is to deploy serivce directly on the node, namely bypassing routing mesh, which ask the service run in global mode and with pubished port setting as mode=host, which should be the same as --network host in replicas mode. the limitation of bypassing routing mesh, is only one task on one node, and access the published port can only require the service from this special node, which doesn’ make sense in cloud env. 12345678910docker service create \ --mode global \ --publish mode=host,target=80,published=8080 \ --generic-resource &quot;gpu=1&quot; \ --env DISPLAY \ --mount src=&quot;tmp&quot;,dst=&quot;/root/.Xauthority&quot; \ --mount src=&quot;X11-unix&quot;,dst=&quot;/tmp/.X11-unix&quot; \ --network host \ --name lgsvl \ lgsvl:latest tips: –mount src=”X11-unix”,dst=”/tmp/.X11-unix” is kind of cached. so once docker image has ran in worker node, then it doesn’t need to pass this parameter again, but once worker node restart, it need this parameter again in summary about swarm netowrk, the routing mesh should be the right solution for cloud deployment. so how to bypass --network host ? the reason of host network is the lgsvl server and webUI can works in same host; if not, there is kind of cross-domain security failures, which actually is another topic, namely, how to host lgsvl and webUI/React in different hosts. need some study of webUI in next post.]]></content>
      <tags>
        <tag>lgsvl</tag>
        <tag>Docker</tag>
      </tags>
  </entry>
  <entry>
    <title><![CDATA[where are you in next 4 years (1)]]></title>
    <url>%2F2019%2F12%2F08%2Fwhere-are-you-in-next-4-years-1%2F</url>
    <content type="text"><![CDATA[it’s more than one year since I started this series “where are you in the next 5 years”, I wound love to transfer to “the next 4 years”, and thanks for the opportunity to get back in China, so there is a high chance to involve in the market heavily in a short time. at the begining of the year, travelled around the whole nation, stay in Shanghai, Beijing, Shenzhen, Guangzhong. and that was a great chance to get familiar with the startups in autonomus vehicle, as this trip really gave me some input, and till now I had another half year in one of the top OEMs in China. combined with this two sources, which gave me the kind of the whole picture of ADS market happening in China. I would love to write this blog more in bussiness thought, rather than engineering way. L4 startupsADS leap time is about 2016 to the first half year of 2018. there are a bunch of startups and also most OEMs have build their ADS teams. the startups, e.g. Pony.ai, WeRide, AutoX, roadstar(the new split), ToSimple, Momenta. which are still very active recently. Today I have taken PlusAI’s tech open day, I have to say, most of these startups have very similar tech roadmap. I personally, think that’s a really sad thing. a few teches they all have: simulation pipeline data pipeline(collect, label, training) AI based perception, motion planning friendly HMI WeRide and Pony.ai are in Robtaxi service; ToSimple and PlusAI are in highway logistics; Momenta is in harbor transformation. Alibaba, jingdong, Meituan e.t.c are in small personal package delivery shuttles, similar as Nuro. OEMs focus in the passenger vehicles. DiDi focus in taxi services as well, similar as Uber, Waymo. all of them can be called as ADS service suppliers. however, most of them use the exactly same sensor packages, including Lidar, Camera, Radar, GPS e.t.c. the software stacks during prodcut dev as mentioned above are mostly similar; there maybe a few special features during the services in deployment, e.g. Robtaxi may have a Uber-like call-taxi app e.t.c, Rather than that, nothing really is amazing about ADS itself. and mostly this is not a tech problem, it’s must be defined or find out by the social guys, who are from the real needs. in the engineering work environment, it’s easily to misunderstand the role of engineering. engineering is the bumper, only when the house need fixed, there is a need for bumper. However, in an engineering-centered env, it’s so easy to tell no difference between I have the bumper and I have the needs. My experieince till now, I am learning how to use the bumper well, but few thinks why need to learn to use the bumper. on the other hand, what kind of tech is really helpful or profitable? by chance, to talk with Unity China team, who are enhancing Unity3D engine with cloud support, unity simualtion, which is the feature I am looking for a while. if the tech pipeline is the waterflow, the Unity team is the one standing at the upper flow, who can implement new features in engine. just like Nvidia, Google e.t.c, these are the guys who really make a difference with their tech. and it’s profitable of course.]]></content>
  </entry>
  <entry>
    <title><![CDATA[where are you in next 4 years (2)]]></title>
    <url>%2F2019%2F12%2F08%2Fwhere-are-you-in-next-4-years-2%2F</url>
    <content type="text"><![CDATA[backgroudjoined the zhongguancun self-driving car workshow today, different level than PlusAI’s tech show at last weekend. there are a few goverment speakers, e.g. zhongguancun tech officer. and sensor suppliers, e.g. SueStar, zhongke hui yan, holo matic, self-driving solution suppliers, e.g. pony.ai; xiantong tech e.t.c, and media press and investors, interesting thing the investors doesn’t look promising. as mentioned last time, the ADS solution suppliers and most sensor startups join the champion, in about 2 years. and goverment are invovled with policy-friendy support. just the captial market or investors don’t buy it at this moment. XianTongXianTong, focused in city road cleaning, which is a special focus, rather than passenger vehicles, pacakge trucks, or small delivery robotics. They have some data in the cleanning serivces in large cities, e.g. beijing, shanghai, hangzhou, and Europen. current cleaner laybor’s harsh and dangerous working env current cleaner laybor’s limitation to working hours, benefits requirements they mentioned the city cleaning market is about 300 billions in China, which looks promising, but how much percent of the cleaning vechiles in this market is not talked. it’s maybe about 20% ~ 60%, as there are a lot human resources, city green plant needs e.t.c, which eats a lot of money, and the current cleaner vehicle products that support ADS maybe has an even smaller part in the whole vehicles used in city cleaning services. so the whole city clearning service market sounds promising, but down to the clearning vehicles, and especailly without a matured and in-market cleaner vehicle product, it’s really difficult to digest and dig golden from the market. I have a feeling, most startups has the similar gap, they do vision big, e.g. to assist the city, the companies, the bussiness, the end customers run/live more efficiently/enjoyable/profitable. but the reality is not that friendly for them, as they spent investor’s money, which expect to get profitable return in a short time. which may push the startups to draw big pictures far beyond their ability, or even far beyond the whole industry’s ability. as well as they draw big pictures, they are very limited to deep into the market, to understand the customers, to design the products with original creativity. creativity or applicationfor investors, these a special industry-application based startups, I think, at most may get investing profit at 1 ~ 4 times. maybe it’s a good idea to understand the successful invest cases happened in last 5 years. And I am afraid that’s also a self-centered option, that most CV happened in high-tech, Internet-based startups. cause the current self-driving market, especially the startups in China, which focus in ADS full-stack solutions, sensors, services providers, are not a game-change player. in history, the first successful company to product PCs, smart phones, Internet searching serivce, social network servie, taxing service, booking (restrount) service, food delivery service, they are game changers. and somehow they are most talked in public, in investers, and most understandable by most knowledge consumers, e.g. engineers. but are they the whole picture of the national economy? what about the local seafood resturant, the local auto repair shops; or the company who product pig’s food, who product customers’ bikes; or the sales company who sell steel to Afria. The economy is more plentiful and complex, than a mind-straight knowledge consumer can image. For a long time, I didn’t realise the local food restarant or a local gym, but they do higher money payback, and definitely higher social status, than a fixed income engineer. so don’t try to find out the secret of each component of the whole economy, and then try to find out the most optimized way to live a life. there is no, or every way to live a life, it is the most optimized way. so the CEOs of these startups, are not crazy to image themselves as the game changers, like Steve Jobs, so they know their energy. surviving firstsince as they know their limitation, so they are not nuts, so they are just enough energy to find a way to survive, even not in a good shape. that’s the other part, as an naive young man, always forgot the fact that surviving is not beautiful most times. the nature has tell the lesson: the young brids only kill his/her brothers/sisters, then he/she can survive to grow into adult. for the deers in Afria plant, each minite is either run to survive or being eaten. companies or human as a single, has the same situation, even the goverment try to make it a little less harsh, but most time to survive is difficult. market sharingas a young man, 1000 billion market sounds like a piece of cake, when comes to the small company, who work so hard to get a million level sale, sounds like a piece of shit. and that’s another naive mind about the reality. like I can’ see the money return from a local seafood resturout, and when I found out it does get more than a million every year, the whole me is shocked. so there is no big or small piece of cake, as it come to survive. most CEOs, they are not nuzy, and they know clearly in their heart, that their company need to survivie and make a profitable income, and that’s enough, to change the world is not most people’s responsibility, but survive it is. however, these CEOs in publich, they talk their company or product as the game-changers, that’s what the investors’ want they to say. so don’t think the market is too small any more, as well as it can support a family to live a life. dream is not the option to most people, that’s the final summary. but survive is a have-to. life is evergreenlife is not only about surviving. if else, human society is an animal world. thinking reasonal always give the doomed impression, and the life in blue; once more perceptual in mind, the day is sunny and joyful. “the theory is grey, but the tree of life is evergreen” develop team in OEMas mentioned, currently there are plenty of system simulation verification special suppliers, e.g. 51 vr, baidu, tencent, alibaba e.t.c, and definitely there softwares are more matured than our team. I am afriad jsut at this moment, the leader teams in OEM don’t realize that to build a simualtion tool espcially to support L3 is mission-impossible. if else, the requirements for simulation, should come from suppliers, rather than build OEM’s own simualtion develop team. I still remember the first day to join this team, sounds like we are really creating something amazing, and nobody else there have more advantages than us. then gradually I realize we can’t customize the Unity engine, we can’t support matured cloud support, we can’t even implement the data pipeline for road test log and analysis. most current team members work stay in requirements level, and in a shadow mode. and actually most of these needs, does have a few external companies/teams have better solution. there does a lot existing issues, from software architecture to ADS algorithms restructure, but these work is mostly not done by OEM develop team. the second-level developing team, can’t make the top-class ADS product. as the new company will split out, this team is less possible to survive in the market, or the leaders have to set up a new developing team. if AI is the direction, that’s another huge blank for this team. I think either go to the ADS software architecture or to AI is a better choice now.]]></content>
  </entry>
  <entry>
    <title><![CDATA[running Vulkan in virtual display]]></title>
    <url>%2F2019%2F12%2F04%2Frunning-Vulkan-in-virtual-display%2F</url>
    <content type="text"><![CDATA[install xserver-xorg-video-dummyapt-cache search xserver-xorg-video-dummy apt-get update sudo apt-get install xserver-xorg-video-dummy which depends on xorg-video-abi-20 and xserver-xorg-core, so need to install xserver-xorg-core first. after update xorg.conf as run Xserver using xserver-xorg-video-dummy driver, and reboot the machine, which leads both keyboard and mouse doesn’t reponse any more. understand xorg.confusually, xorg.conf is not in system any more, so most common use, the xorg will configure the system device by default. if additional device need to configure, can run in root X -configure, which will generate xorg.conf.new file at /root. there are two xorg.conf, one generated by running X -configure, which located at /root/xorg.conf.new ; the other is generated by nvidia-xconfigure, which can be found at /etc/X11/xorg.conf. the following list is from xorg.conf doc ServerLayout section it is at the highest level, they bind together the input and output devices that will be used in a session. input devices are described in InputDevice sections, output devices usualy consist of multiple independent components(GPU, monitor), which are defined in Screen section. each Screen section binds togethere a graphics board(GPU) and a monitor. the GPU are described in Device sections and monitors are described in Monitor sections FILES section used to specify some path names required by the server. e.g. ModulePath, FontPath .. SERVERFLAGS section used to specify global Xorg server options. all should be Options &quot;AutoAddDevices&quot;, enabled by default. MODULE section used to specify which Xorg server (extension) modules shoul be loaded. INPUTDEVICE section Recent X servers employ HAL or udev backends for input device enumeration and input hotplugging. It is usually not necessary to provide InputDevice sections in the xorg.conf if hotplugging is in use (i.e. AutoAddDevices is enabled). If hotplugging is enabled, InputDevice sections using the mouse, kbd and vmmouse driver will be ignored. Identifier and Driver are required in all InputDevice sections. Identifier used to specify the unique name for this input device; Driver used to specify the name of the driver. An InputDevice section is considered active if it is referenced by an active ServerLayout section, if it is referenced by the −keyboard or −pointer command line options, or if it is selected implicitly as the core pointer or keyboard device in the absence of such explicit references. The most commonly used input drivers are evdev(4) on Linux systems, and kbd(4) and mousedrv(4) on other platforms. a few driver-independent Options in InputDevice: CorePointer and CoreKeyboard are the inverse of option Floating, which, when enabled, the input device does not report evens through any master device or control a cursor. the device is only available to clients using X input Extension API. Device section there must be at least one, for the video card(GPU) being used. Identifier and Driver are required in all Device sections. Monitor Sectionthere must be at least one, for the monitor being used. the default configuration will be created when one isn’t specified. Identifier is the only mandatory. Screen SectionThere must be at least one, for the “screen” being used, represents the binding of a graphics device (Device section) and a monitor (Monitor section). A Screen section is considered “active” if it is referenced by an active ServerLayout section or by the −screen command line option. The Identifier and Device entries are mandatory. debug keyboard/mouse not response after X upgrade login to Ubuntu safe mode, by F12 –&gt; Esc (to display GRUB2 menu), then enable network –&gt; root shell run X -configure one line say: List of video drivers: dummy, nvidia, modesetting. uninstall xserver-xorg-video-dummyI thought the dummy video driver is the key reason, so uninstall it, then rerun the lines above, check /var/log/Xorg.0.log: 1234567891011121314151617181920212223[ 386.768] List of video drivers:[ 386.768] nvidia[ 386.768] modesetting[ 386.860] (++) Using config file: &quot;/root/xorg.conf.new&quot;[ 386.860] (==) Using system config directory &quot;/usr/share/X11/xorg.conf.d&quot;[ 386.860] (==) ServerLayout &quot;X.org Configured&quot;[ 386.860] (**) |--&gt;Screen &quot;Screen0&quot; (0)[ 386.860] (**) | |--&gt;Monitor &quot;Monitor0&quot;[ 386.861] (**) | |--&gt;Device &quot;Card0&quot;[ 386.861] (**) | |--&gt;GPUDevice &quot;Card0&quot;[ 386.861] (**) |--&gt;Input Device &quot;Mouse0&quot;[ 386.861] (**) |--&gt;Input Device &quot;Keyboard0&quot;[ 386.861] (==) Automatically adding devices[ 386.861] (==) Automatically enabling devices[ 386.861] (==) Automatically adding GPU devices[ 386.861] (**) ModulePath set to &quot;/usr/lib/xorg/modules&quot;[ 386.861] (WW) Hotplugging is on, devices using drivers &apos;kbd&apos;, &apos;mouse&apos; or &apos;vmmouse&apos; will be disabled.[ 386.861] (WW) Disabling Mouse0[ 386.861] (WW) Disabling Keyboard0Xorg detected mouyourse at device /dev/input/mice.Please check your config if the mouse is still notoperational, as by default Xorg tries to autodetectthe protocol. there is a warning: (WW) Hotplugging is on, devices using drivers &#39;kbd&#39;, &#39;mouse&#39; or &#39;vmmouse&#39; will be disabled disable Hotpluggingfirst generate by X -configure at /root/xorg.conf.new, and copy it to /etc/X11/xorg.conf. then add the additional section in /etc/X11/xorg.conf, , which will disable Hotplugging: 1234Section &quot;ServerFlags&quot;Option &quot;AllowEmptyInput&quot; &quot;True&quot;Option &quot;AutoAddDevices&quot; &quot;False&quot;EndSection however, it reports: 12345678(EE) Failed to load module &quot;evdev&quot; (module does not exist, 0)(EE) NVIDIA(0): Failed to initialize the GLX module; please check in your X(EE) NVIDIA(0): log file that the GLX module has been loaded in your X(EE) NVIDIA(0): server, and that the module is the NVIDIA GLX module. If(EE) NVIDIA(0): you continue to encounter problems, Please try(EE) NVIDIA(0): reinstalling the NVIDIA driver.(EE) Failed to load module &quot;mouse&quot; (module does not exist, 0)(EE) No input driver matching `mouse&apos; switch to nvidia xorg.confwhich reports: 1234(EE) Failed to load module &quot;mouse&quot; (module does not exist, 0)(EE) No input driver matching `mouse&apos;(EE) Failed to load module &quot;evdev&quot; (module does not exist, 0)(EE) No input driver matching `evdev&apos; it fix the Nvidia issue, but still can’t fix the input device and driver issue. switch to evdev driveras mentioned previously, evdev driver is the default driver for Linux, and will be loaded by Xserver by default. so try to both Keyboard and Mouse driver to evdev, which reports: 1234(EE) No input driver matching `kbd&apos;(EE) Failed to load module &quot;kbd&quot; (module does not exist, 0)(EE) No input driver matching `mouse&apos;(EE) Failed to load module &quot;mouse&quot; (module does not exist, 0) looks it’s the problem of driver, even the default driver is missed. I try to copy master node’s /usr/lib/xorg/modules/input/ to worker node, then it reports : 12(EE) module ABI major version (24) doesn&apos;t match the server&apos;s version (22)(EE) Failed to load module &quot;evdev&quot; (module requirement mismatch, 0) which can be fixed by adding Option IgnoreABI . delete customized keyboard and mouseif enable Hotplugging, the X will auto detect the device, I’d try: 1234567891011121314151617181920212223242526272829303132333435Section &quot;ServerLayout&quot; Identifier &quot;Layout0&quot; Screen 0 &quot;Screen0&quot; 0 0 EndSectionSection &quot;Monitor&quot; Identifier &quot;Monitor0&quot; VendorName &quot;Unknown&quot; ModelName &quot;Unknown&quot; HorizSync 28.0 - 33.0 VertRefresh 43.0 - 72.0 Option &quot;DPMS&quot;EndSectionSection &quot;Device&quot; Identifier &quot;Device0&quot; Driver &quot;nvidia&quot; VendorName &quot;NVIDIA Corporation&quot;EndSectionSection &quot;Screen&quot; Identifier &quot;Screen0&quot; Device &quot;Device0&quot; Monitor &quot;Monitor0&quot; DefaultDepth 24 SubSection &quot;Display&quot; Depth 24 EndSubSectionEndSectionSection &quot;ServerFlags&quot; Option &quot;AllowEmptyInput&quot; &quot;True&quot; Option &quot;IgnoreABI&quot; &quot;True&quot;EndSection which reports: 123(II) No input driver specified, ignoring this device.(II) This device may have been added with another device file.(II) config/udev: Adding input device Lenovo Precision USB Mouse (/dev/input/mouse0 there is no ERROR any more, but looks the default Input driver (evdev?) can’t be found out … reinstall xorgMouse and keyboard can be driven by evdev or mouse/keyboard driver respectively. Xorg will load only endev automatically, To use mouse and/or keyboard driver instead of evdev they must be loaded in xorg.conf. There is no need to generate xorg.conf unless you want to fine tune your setup or need to customize keyboard layout or mouse/touchpad functionality. firstly configure new network interface for worker node: configure DHCP network connection setting at /etc/network/interface: 12auto enp0s25 iface enp0s25 inet dhcp ifconfig enp0s25 downifconfig enp0s25 up then reinstall xorg: sudo apt-get update sudo apt-get upgrade sudo apt-get install xserver-xorg-core xserver-xorg xorg which install these libs, xserver-xorg-input-all, xserver-xorg-input-evdev, xserver-xorg-inut-wacom, xserver-xorg-input-vmouse, xserver-xorg-input-synaptics, these are the exact missing parts(input device and drivers). it looks when uninstall video-dummy, these modules are deleted by accident. reboot, both keyboard and mouse work ! “sudo startx” through ssh now the user password doesn’t work in normal login, but when ssh login from another machine, the password verify well. which can be fixed by ssh login from remote host first, then run sudo startx, which will bring the user-password verification back virtual displayxdummy xdummy: xorg.conf run: Xorg -noreset +extension GLX +extension RANDR +extension RENDER -logfile ./10.log -config ./xorg.conf :10 test with glxgears/OpengGL works 1) DISPLAY=localhost:10.0 works 2) DISPLAY=:0 works, but you can’t see it, cause the worker host is in virtual display test with vkcube/Vulkan failed in summary, the vitual display can support OpenGL running, but doesn’t support Vulkan yet. unity simulation cloud SDK is the vendor’s solution, but licensed. refersample xorg.conf for dummy device Keyboard and mouse not responding at reboot after xorg.conf update how to Xconfigure no input drivers loading in X]]></content>
  </entry>
  <entry>
    <title><![CDATA[recent thoughts in ADS]]></title>
    <url>%2F2019%2F11%2F29%2Frecent-thoughts-in-ADS%2F</url>
    <content type="text"><![CDATA[the following are some pieces of ideas during discussion and online resources. system engineering in practicalthe following idea is coming from the expert of system engineering. originally, system engineering or model based design sounds come from aerospace, defense department. the feature of these products: 1) they are the most complex system 2) they are sponsored by the goverment 3) they are unique and no competitors which means they don’t need worry about money and time, so to gurantee the product finally works, they can design from top to down in a long time. the degrade order of requirements level comes as: areospace, defense product &gt;&gt; vehicle level product &gt;&gt; industry level product &gt;&gt; customer level product usually the techs used in the top level is gradually degrading into the next lower level in years. e.g. GPS, Internet, autonomous e.t.c. at the same time, the metholodies from top level go to lower level as well. I suppose that’s why system engineeering design comes to vehicle industry. however, does it really work in this competitional industry? I got the first experince when running scenaio testes in simulation SIL. as the system engineering team define the test cases/scenarios, e.g. 400 test scenarios; on the other hand, the vehicle test team does the road test a few times every week.the result is, most time the 400 test scenarios never catch a system failure; but most road test failure scenario does can be repeated in the simulation env. system engineering based design doesn’t fit well. there are a lot reasons. at first, traditionally the design lifetime of a new vehicle model is about 3~5 years, and startup EV companies recently has even shorter design life cycle, about 1~2 years. so a top-down design at the early time, to cover every aspect of a new model, does almost not make sense. in the V development model, most fresh engineers thought the top-down design is done once for all, the reality is most early stage system engineering desgin need be reconstructured. secondly, system engineering design usually is abstract and high beyond and except engineering considerations, as the system engineers mostly doesn’t have engineering experience in most sections of the sysetem. which results in the system engineering based requirements are not testable, can’t measure during section implementation. there are a few suggestions to set a workable system engineering process: the system engineering team should sit by the develop teams and test teams, they should have a lot of communication, and balance the high-level requirements and also testable, measurable, implementable requirements. basically, system engineering design should have product/component developers as input. both the system engineers and developers should understand the whole V model, including system requirements, component requirements are iteratable. focus on the special requirement, and not always start from the top, each special requirement is like a point, and all these existing points(already finished requirements) will merged to the whole picture finally. take an example, during the road test, there will come a new requirement to have a HMI visulization, then focus on this HMI requirement, cause this requirements may not exist in the top down design. but it is the actual need. system test and verification CI/CDas most OEMs have said they will massive product L3 ADS around 2022, it is the time to jump into the ADS system test and verification. just knew that Nvidia has the full-stack hardware lines: the AI chips in car(e.g. Xavier), the AI training workstation(e.g. DGX), and the ADS system verifcation platform(e.g. Constallation box). data needsthe ADS development depends a lot on data infrastructure: data collect --&gt; data storage --&gt; data analysis there are many small pieces as well, e.g. data cleaning, labeling, training, mining, visulization e.t.c from different dev stage or teams, there are different focus. road test/sensor team, they need a lot of online vehicle status/sensor data check, data logging, visulization(dev HMI), as well as offline data analysis and storage perception team, need a lot of raw image/radar data, used to train, mine, as well as to query and store. planning/control team, need high quality data to test algorithms as well as a good structured in-car computer. HMI team, are focusing on friendly data display fleet operation team, need think about how to transfer data in cloud, vehicle, OEM data centers e.t.c. sooner or later, data pipepline built up is a have to choice. data collection vendorsroad test data collection equipment used in ADS development, is actually not a very big market, compared to in-var computers. but still there are a few vendors already. the top chip OEMs, e.g. Nvidia, Intel has these products. chip poxy, e.g. Inspire traditional vehicle test vendors, e.g. Dspace, Vector, Prescan startups, e.g. horizon Nvidia constellationADS system test usually includes simulation test and road test. and the road test is also called vehicle-in-loop, which is highly expensive and not easy to repeat; then is hardware-in-loop(HIL) test, basically including only the domain controller/ECU in test loop; finally is the is software-in-loop(SIL) test, which is most controllable but also not that reliable. in practical, it’s not easy to build up a closed-loop verification(CI/CD) process from SIL to HIL to road test. and once CI/CD is setted up, the whole team can be turned into data/simulation/test driven. the difficult and hidden part is the supporting toolchain development. Most vehicle test vendors have their special full-stack solution toolchains, but most of them are too eco-customized, it’s really difficult for ADS team, specially OEMs, to follow a speical vendor solution. another reason, test vehicles include components from different vendors, e.g. camera from sony, radar from bosch, Lidar from a Chinese startup, logging equipment from dSpace, and ECUs from Conti. which makes it difficult to fit this mixed system into a Vector verification platform. Nvidia Constallation is trying to meet the gap from SIL, HIL to road test. as it can suppport most customized ECUs. from road test to HIL, it use the exactly same chip. for road test resimulation, Nvidia offer a sim env, and the road test log can feed in directly the ability to do resimulation of road test is a big step, the input is directly scanned images/cloud points, even lgsvl, Carla has no such direct support. but resimulation is really useful for CI/CD. Nvidia constallation as said, is the solution from captured data to KPIs. another big thing is about their high-level scenario description language(HLSDL), which I think is more abstract than OpenScenario. the HLSDL engine use hyper-parameters, SOTIF embedded scenario idea, and optimized scenario generator, which should be massive, random as well as KPI significantly, it should be a good scenario engine if it has these features. Bosch VMSvehicle management system(VMS) is cloud nature framework from Bosch, which is used to meet the similar requirements as Nvidia’s solution, to bring the closed-loop(CI/CD) from road test data collection, data anlaysis to fleet management. they have a few applications based on VMS: fleet batteries management(FBM) for single vehicle’s diaglostic, prediction; and for the EV market, FBM can be used as certification for second-hand EV dealers road coefficient system(RCS) Bosch has both in-vehicle data collection box and cloud server, RCS will be taken as additional sensor for ADS in prodcut VMS in itself Bosch would like to think VMS as the PLM for ADS, from design, test, to deployment. and it shoul be easy to integrate many dev tools, e.g. labeling, simulation e.t.c what about safetyas mentioned previously, 80% of Tesla FSD is to handle AI computing, Nvidia Xavier has about 50% GPU; Mobileye has very limited support for AI. all right, Tesla is most AI aggressive, then Nvidia, then Mobileye is most conserved. which make OEMs take Mobileye solution as more safety, but AI does better performance in perception, so how to balance these two ways? I realized the greats of Mobileye’s new concept: responsibility sensitive safety(RSS), RSS can be used as the ADS safety boundary, but inside either AI or CV make the house power. a lot of AI research on mixed traditional algorithms with AI algorithms, RSS sounds the good solution. would be nice to build a general RSS Mixing AI(RMA) framework.]]></content>
  </entry>
  <entry>
    <title><![CDATA[X11 GUI in docker]]></title>
    <url>%2F2019%2F11%2F29%2FX11-GUI-in-docker%2F</url>
    <content type="text"><![CDATA[XorgXorg is client-server architecture, including Xprotocol, Xclient, Xserver. Linux itself has no graphics interface, all GUI apps in Linux is based on X protcol. Xserver used to manage the Display device, e.g. monitor, Xserver is responsible for displaying, and send the device input(e.g. keyboard click) to Xclient. Xclient, or X app, which includes grahics libs, e.g. OpenGL, Vulkan e.t.c xauthorityXauthority file can be found in each user home directory and is used to store credentials in cookies used by xauth for authentication of X sessions. Once an X session is started, the cookie is used to authenticate connections to that specific display. You can find more info on X authentication and X authority in the xauth man pages (type man xauth in a terminal). if you are not the owner of this file you can’t login since you can’t store your credentials there. when Xorg starts, .Xauthority file is send to Xorg, review this file by xauth -f ~/.Xauthority ubuntu@ubuntu:~$ xauth -f ~/.XauthorityUsing authority file /home/wubantu/.Xauthorityxauth&gt; listubuntu/unix:1 MIT-MAGIC-COOKIE-1 ee227cb9465ac073a072b9d263b4954eubuntu/unix:0 MIT-MAGIC-COOKIE-1 71cdd2303de2ef9cf7abc91714bbb417ubuntu/unix:10 MIT-MAGIC-COOKIE-1 7541848bd4e0ce920277cb0bb2842828 Xserver is the host who will used to display/render graphics, and the other host is Xclient. if Xclient is from remote host, then need configure $DISPLAY in Xserver. To display X11 on remote Xserver, need to copy the .Xauthority from Xserver to Xclient machine, and export $DISPLAY and $XAUTHORITY 12export DISPLAY=&#123;Display number stored in the Xauthority file&#125;export XAUTHORITY=&#123;the file path of .Xauthority&#125; xhostXhost is used to grant access to Xserver (on your local host), by default, the local client can access the local Xserer, but any remote client need get granted first through Xhost. taking an example, when ssh from hostA to hostB, and run glxgears in this ssh shel. for grahics/GPU resources, hostA is used to display, so hostA is the Xserver. x11 forwardingwhen Xserver and Xclient are in the same host machine, nothing big deal. but Xserver, Xclient can totally be on different machines, as well as Xprotocol communication between them. this is how SSH -X helps to run the app in Xclient, and display in Xserver, which needs X11 Forwarding. test benchmark12ssh 192.16.0.13xeyes /tmp/.X11-unixthe X11(xorg) server communicates with client via some kind of reliable stream of bytes. A Unix-domain socket is like the more familiar TCP ones, except that instead of connecting to an address and port, you connect to a path. You use an actual file (a socket file) to connect. srwxrwxrwx 1 root root 0 Nov 26 08:49 X0 the s in front of the permissions, which means its a socket. If you have multiple X servers running, you’ll have more than one file there. is where X server put listening AF_DOMAIN sockets. DISPLAY deviceDISPLAY format: hostname: displaynumber.screennumber hostname is the hostname or hostname IP of Xserver displaynumber starting from 0screennumber starting from 0 when using TCP(x11-unix protocol only works when Xclient and Xserver are in the same machine), displaynumber is the connection port number minus 6000; so if displaynumber is 0, namely the port is 6000. DISPLAY refers to a display device, and all graphics will be displayed on this device.by deafult, Xserver localhost doesn’t listen on TCP port. run: sudo netstat -lnp | grep &quot;6010&quot;, no return. how to configure Xserver listen on TCP 1Add DisallowTCP=false under directive [security] in /etc/gdm3/custom.conf file. Now open file /etc/X11/xinit/xserverrc and change exec /usr/bin/X -nolisten tcp to exec /usr/bin/X11/X -listen tcp. Then restart GDM with command sudo systemctl restart gdm3. To verify the status of listen at port 6000, issue command ss -ta | grep -F 6000. Assume that $DISPLAY value is :0. virtual DISPLAY devicecreating a virtual display/monitoradd fake display when no Monitor is plugged in Xserver broadcastthe idea behind is to stand in one manager(Xserver) machine, and send command to a bunch of worker(Xclient) machines. the default way is all Xclient will talk to Xserver, which eat too much GPU and network bandwith resources on manager node. so it’s better that each worker node will do the display on its own. and if there is no monitor on these worker nodes, they can survive with virtual display. xvfbxvfb is the virtual Xserver solution, but doesn’t run well(need check more) nvidia-xconfigconfigure X server to work headless as well with any monitor connected unity headlessenv setupto test with docker, vulkan, ssh, usually need the following packages: vulkan dev envsudo add-apt-repository ppa:graphics-drivers/ppa sudo apt upgrade apt-get install libvulkan1 vulkan vulkan-utils sudo apt install vulkan-sdk nvidia envinstall nvidia-driver, nvidia-container-runtime install mesa-utils #glxgears docker envinstall docker run glxgear/vkcube/lgsvl in docker through ssh tunnelthere is a very nice blog: Docker x11 client via SSH, disccussed the arguments passing to the following samples run glxgearglxgear is OpenGL benchmark test. 12ssh -X -v abc@192.168.0.13sudo docker run --runtime=nvidia -ti --rm -e DISPLAY -v /tmp/.X11-unix:/tmp/.X11-unix -v &quot;$HOME/.Xauthority:/root/.Xauthority&quot; --net=host 192.168.0.10:5000/glxgears if seting $DISPLAY=localhost:10.0 , then the gears will display at master node(ubuntu) if setting $DISPLAY=:0, then the gears will display at worker node(worker) and w/o /tmp/.X11-unix it works as well. run vkcubevkcube is Vulkan benchmark test. 123ssh -X -v abc@192.168.0.13export DISPLAY=:0sudo docker run --runtime=nvidia -ti --rm -e DISPLAY -v /tmp/.X11-unix:/tmp/.X11-unix -v &quot;$HOME/.Xauthority:/root/.Xauthority&quot; --net=host 192.168.0.10:5000/vkcube in this way, vkcube is displayed in worker node(namely, using worker GPU resource), manager node has no burden at all. if $DISPLAY=localhost:10.0, to run vkcube, give errors: No protocol specified Cannot find a compatible Vulkan installable client driver (ICD). Exiting ... looks vulkan has limitation. run lgsvl123export DISPLAY=:0sudo docker run --runtime=nvidia -ti --rm -p 8080:8080 -e DISPLAY -v /tmp/.X11-unix:/tmp/.X11-unix -v &quot;$HOME/.Xauthority:/root/.Xauthority&quot; --net=host 192.168.0.10:5000/lgsvl /bin/bash ./simulator works well! the good news as if take the manager node as the end user monitor, and all worker nodes in cloud, without display, then this parameters will be used in docker service create to host in the swarm. so the next step is to vitual display for each worker node. referwiki: Xorg/XserverIBM study: Xwindowscnblogs: run GUI in remote serverxorg.conf in ubuntuconfigure XauthorityX11 forwarding of a GUI app running in dockercnblogs: Linux DISPLAY skillsnvidia-runtime-container feature: Vulkan support]]></content>
      <tags>
        <tag>Docker</tag>
        <tag>X11</tag>
      </tags>
  </entry>
  <entry>
    <title><![CDATA[what about Tesla]]></title>
    <url>%2F2019%2F11%2F26%2Fwhat-about-Tesla%2F</url>
    <content type="text"><![CDATA[Tesla is ROCKING the industry. OTA, camera only, fleet learning, shadow mode, Autopilot, Gega Factory, Cybertruck etc. There is a saying: “看不到，看不懂，追不上(can’t see it; can’t understand it; can’t chase it)”. I have to say most Tesla news are more exciting than news from traditional OEMs. and my best wishes to Tesla to grow greater. Tesla timeline2019 release Cybertruck Tesla software V10.0 OTA: Smart Summon(enable vehicle to navigate a parking lot and come to them or their destination of choice, as long as their car is within their line of sight) Driving Visualization(HMI) Automatic lane change lane departure avoidance: Autopilot will warn the driver and slow down the vehicle emergency lane departure avoidance: Autopilot will steer the vehciel back into the driving lane if the off-lane may lead to collision Model 3 safety reward from IIHS and Euro NCAP Tesla Insurance Megapack: battery storage for massive usage V3 super charging station: more powerful station and pre-heating battery Powerpack: energy storage system in South Austrilia Model 3 release (March) and the most customer satisfied vehicles in China cut-off 7% employees globally(Jan) 2018 race mode Model 3 Model 3 the lowest probability of injury by NHTSA: what make Model 3 safe Tesla V9.0 OTA: road status info climate control Navigate on Autopilot Autosteer and Auto lane change combination blindspot warning(when turn signal in engaged but a vehicle or obstacle is detected in the target lane) use high occupancy vehicle(HOV) lane obstacle aware acceleration(if obstacle detected, acc is automatically limited) Dashcam (record and store video footage) Tesla privatization (Aug) 2017 super charing station 10000 globally collabration with Panasonic to produce battery at Buffalo, 1Mpw 2016 purchase SolarCity purchase Grohmann Enginering(German): highly automatic manufacture massive product of Tesla vehicles with hardwares to support fully self driving(Oct) 8 cameras to support 360 view, in front 250 meters env detection 12 Ultrasonic front Radar ADS HAD (x40 powerful than previous) ADS algorithm: deep learning network combine with vision, radar and ultrasonic(AEB, collision warning, lane keeping, ACC is not included yet) Autopilot 8.0 OTA, or namely “see” the world through Radar Tesla’s master plane 2 deadly accident across the truck(2016.7) HEPA defense “biological weapon” accept reservation for Model 3(march) Tesla 7.1.1 OTA: remote summon Tesla 7.1 OTA: vertical parking speed gentelly in living house area highway ACC, traffic jam following more road info in HMI, e.t.c truck, bus, motobike 2015 Autopilot 7.0 update: Autopark(requires driver in the car and only parallel parking) Autosteer Auto lane changing UI refresh Automatic emergency steering side collision warning Autopilot evoluvationin a nutshell, Autopilot is dynamic cruise control(ACC) + Autosteer + auto lane chang. Autopilot 7.0 relied primarily on the front-facing camera. radar hasn’t been used primarily in 7.0 was due to false positives(wrong detection). but in 8.0 with fleet learning. 8.0 made radar the main sensor input. almost entirely eliminate the false positive – the false braking events – and enable the car to initiate braking no matter what the object is as long as it is not large and fluffy but any large, or metallic or dense, the radar system is able to detect and initiate a braking event. both when Autopilot active or not(then AEB) even if the vision system doesn’t recognize the object, it actually doesn’t matter what the object is(while vision does need to know what the thing is), it just knows there is somehting dense. fleeting learning will mark the geolocation of where all the false alarm occurs, and what the shape of that object. so Tesla system know at a particular position at a particular street or highway, if you see a radar object of a following shape - don’t worry it’s just a road sign or bridge or a Christmas decoration. basically marking these locations as a list of exceptions. the radar system can track 2 cars/obstacles ahead and imporove the cut-in , cut-off reponse. so in case the car in front suddenly swerve out of the way of an obstacle. the limit of hardware is reaching, but there will be still a quite improvement as the software and data would improve quite amount. but still perfect safety is really an impossible goal, it’s really about improving the probability of safety. in Autopilot 9.0, Navigate on Autopilot(Beta) intelligently suggests lane changes to keep you on your route in addition to making adjustments so you don’t get stuck behind slow cars or trucks. Navigate on Autopilot will also automatically steer toward and take the correct highway interchanges and exits based on your destination. Autopilot is keeping evaluation with more exicting features: traffic light and stop signs detection enhanced summon naviagte multi-story parking lots automaticly send off vehicle to park Autopilot on city streets Robotaxi service Tesla HardwareHardware 1.0or Autopliot 1 or AP1, it was a joint development between Mobileye and Tesla. It featured a single front-facing camera and radar to sense the environment plus Mobileye’s hardware and software to control the driving experience. AP1 was so good that when Tesla decided to build their own system, it took them years to catch up to the baseline Autopilot functionality in AP1. Mobileye EyeQ3 is good to mark/label in free space, intuitive routing, obstacle-avoid, and traffic signal recognization etc. but it has a few limitations to env light, and reconstruct 3D world from 2D images etc does work as expect all the time. and EyeQ3 detects objects with traditional algorithms, not cool! AP1 Hardware Suite: Front camera (single monochrome) Front radar with range of 525 feet / 160 meters 12 ultrasonic sensors with 16 ft range / 5 meters Rear camera for driver only (not used in Autopilot) Mobileye EyeQ3 computing platform AP1 Core features: Traffic-Aware Cruise Control (TACC), start &amp; stop Autosteer (closed-access roads, like freeways) Auto Lane Change (driver initiated) Auto Park Summon Hardware 2.0AP2 highlights machine learning/neurual networks with camera inputs, so with more sensors and more powerful computing platforms. AP2 Hardware Suite: Front cameras (3 cameras, medium, narrow and wide angle) Side cameras (4 total, 2 forward and 2 rear-facing, on each side) Rear camera (1 rear-facing) Front radar with range of 525 feet / 160 meters 12 ultrasonic sensors with 26 ft range / 8 meters NVIDIA DRIVE PX 2 AI computing platform AP2 Core features: Traffic-Aware Cruise Control (TACC), start &amp; stop Autosteer (closed-access roads, like freeways) Auto Lane Change (driver initiated) Navigate on Autopilot (on-ramp to off-ramp) Auto Park Summon there was AP2.5 update, with redundant NVIDIA DRIVE PX2 and forward radar with longer range (170m) Hardware 3.0or Full Self Driving(FSD) Computer, Telsa guysSterling Anderson from 2015 - 2016, director of Autopilot program. Chris Latter in early 2017, VP for Autopilot software Jim Keller, from 2016 to 2017, VP for Autopilot hardware David Nister, from 2015 to 2017, VP for Autopilot Stuart Bowers from 2018 -2019, VP for Autopilot Pete Bannon, from 2016 to now, Director for Autopilot hardware Andrej Karpathy, from 2017 to now, Director of AI Tesla in mediaTeslaRati Tesla official Telsa motor club Autopilot review zhihu: Tesla Autopilot history 2017 Mercedes-Benz E vs 2017 Tesla Model S Tesla’s Autopilot 8.0: why Elon Mush says perfect safety is still impossible Transcript: Elon Musk’s press conference about Tesla Autopilot under v8.0 update Tesla reveals all the details of its autopilot and its software v7.0 Software update 2018.39 Tesla V10: first look at release notes and features Tesla Autopilot’s stop sign, traffic light recognition and response is operating in shadow mode Tesla’s full self-driving suite with enhanced summon Tesla’s Robotaxi service will be an inevitable player in the AV taxi race Tesla Autopilot AP1 vs AP2 vs AP3 Tesla Hardware 3 Detailed Future Tesla Autopilot update coming soon Autopilot and full self driving capability features multi view Tesla FSD chips zhihu: EyeQ5 vs Xavier vs FSD]]></content>
      <tags>
        <tag>Tesla</tag>
      </tags>
  </entry>
  <entry>
    <title><![CDATA[deploy lgsvl in docker swarm-2]]></title>
    <url>%2F2019%2F11%2F21%2Fdeploy-lgsvl-in-docker-swarm-2%2F</url>
    <content type="text"><![CDATA[backgroundpreviously tried to deploy lgsvl by docker compose v3, which at first sounds promising, but due to lack of runtime support, which doesn’t work any way. docker service create --generic-resource is another choice. docker service optionsdocker service support a few common options --workdir is the working directory inside the container --args is used to update the command the service runs --publish &lt;Published-Port&gt;:&lt;Service-Port&gt; --network --mount --mode --env –config docker service create with generic-resourcegeneric-resourcecreate services requesting generic resources is supported well: 1234$ docker service create --name cuda \ --generic-resource &quot;NVIDIA-GPU=2&quot; \ --generic-resource &quot;SSD=1&quot; \ nvidia/cuda tips: acutally the keyword NVIDIA-GPU is not the real tags. generic_resource is also supported in docker compose v3.5: 1234generic_resources: - discrete_resource_spec: kind: &apos;gpu&apos; value: 2 --generic-resource has the ability to access GPU in service, a few blog topics: GPU Orchestration Using Docker access gpus from swarm service first tryfollow accessing GPUs from swarm service. install nvidia-container-runtime and install docker-compose, and run the following script: 12345678910export GPU_ID=`nvidia-smi -a | grep UUID | awk &apos;&#123;print substr($4,0,12)&#125;&apos;`sudo mkdir -p /etc/systemd/system/docker.service.dcat EOF | sudo tee --append /etc/systemd/system/docker.service.d/override.conf[Service]ExecStart=ExecStart=/usr/bin/dockerd -H fdd:// --default-runtime=nvidia --node-generic-resource gpu=$&#123;GPU_ID&#125;EOFsudo sed -i &apos;swarm-resource = &quot;DOCKER_RESOURCE_GPU&quot;&apos; /etc/nvidia-container-runtime/config.tomlsudo systemctl daemon-reloadsudo systemctl start docker to understand supported dockerd options, can check here, then run the test as: docker service create --name vkcc --generic-resource &quot;gpu=0&quot; --constraint &apos;node.role==manager&apos; nvidia/cudagl:9.0-base-ubuntu16.04 docker service create --name vkcc --generic-resource &quot;gpu=0&quot; --env DISPLAY=unix:$DISPLAY --mount src=&quot;X11-unix&quot;,dst=&quot;/tmp/.X11-unix&quot; --constraint &apos;node.role==manager&apos; vkcube which gives the errors: 1/1: no suitable node (1 node not available for new tasks; insufficient resourc… 1/1: no suitable node (insufficient resources on 2 nodes) if run as, where GPU-9b5113ed is the physical GPU ID in node: docker service create --name vkcc --generic-resource &quot;gpu=GPU-9b5113ed&quot; nvidia/cudagl:9.0-base-ubuntu16.04 which gives the error: invalid generic-resource request `gpu=GPU-9b5113ed`, Named Generic Resources is not supported for service create or update these errors are due to swarm cluster can’t recognized this GPU resource, which is configured in /etc/nvidia-container-runtime/config.toml second tryas mentioined in GPU orchestration using Docker, another change can be done: ExecStart=/usr/bin/dockerd -H unix:///var/run/docker.sock --default-runtime=nvidia --node-generic-resource gpu=${GPU_ID} which fixes the no suitable node issue, but start container failed: OCI.. 1234root@ubuntu:~# docker service ps vkcc ID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERROR PORTSorhcaxyujece vkcc.1 nvidia/cudagl:9.0-base-ubuntu16.04 ubuntu Ready Ready 3 seconds ago e001nd557ka6 \_ vkcc.1 nvidia/cudagl:9.0-base-ubuntu16.04 ubuntu Shutdown Failed 3 seconds ago &quot;starting container failed: OC…&quot; check daemon log with sudo journalctl -fu docker.service, which gives: 1Nov 21 13:07:12 ubuntu dockerd[1372]: time=&quot;2019-11-21T13:07:12.089005034+08:00&quot; level=error msg=&quot;fatal task error&quot; error=&quot;starting container failed: OCI runtime create failed: unable to retrieve OCI runtime error (open /run/containerd/io.containerd.runtime.v1.linux/moby/9eee7ac30a376ee8f59704f7687455bfb163e5ea3dd6d09d24fbd69ca2dfaa4e/log.json: no such file or directory): nvidia-container-runtime did not terminate sucessfully: unknown&quot; module=node/agent/taskmanager node.id=emzw1f9293rwdk97ki7gfqq1q service.id=qdma7vr1g519lz9hx2y1fen9o task.id=ex1l4wy61kvughns5uzo6qgxy third tryfollowing issue #141 123456nvidia-smi -a | grep UUID | awk &apos;&#123;print &quot;--node-generic-resource gpu=&quot;substr($4,0,12)&#125;&apos; | paste -d&apos; &apos; -ssudo systemctl edit docker[Service]ExecStart=ExecStart=/usr/bin/dockerd -H fd:// --default-runtime=nvidia &lt;resource output from the above&gt; and run: docker service create --name vkcc --generic-resource &quot;gpu=1&quot; --env DISPLAY --constraint &apos;node.role==manager&apos; nvidia/cudagl:9.0-base-ubuntu16.04 it works with output verify: Service converged. However, when test image with vucube or lgsvl it has errors: 1Nov 21 19:33:20 ubuntu dockerd[52334]: time=&quot;2019-11-21T19:33:20.467968047+08:00&quot; level=error msg=&quot;fatal task error&quot; error=&quot;task: non-zero exit (1)&quot; module=node/agent/taskmanager node.id=emzw1f9293rwdk97ki7gfqq1q service.id=spahe4h24fecq11ja3sp8t2cn task.id=uo7nk4a3ud201bo9ymmlpxzr3 to debug the non-zero exit (1) : docker service ls #get the dead service-ID docker [service] inspect r14a68p6v1gu # check docker ps -a # find the dead container-ID docker logs ff9a1b5ca0de # check the log of the failure container it gives: Cannot find a compatible Vulkan installable client driver (ICD) check the issue at gitlab/nvidia-images forth trydocker service create --name glx --generic-resource &quot;gpu=1&quot; --constraint &apos;node.role==manager&apos; --env DISPLAY --mount src=&quot;X11-unix&quot;,dst=&quot;/tmp/.X11-unix&quot; --mount src=&quot;tmp&quot;,dst=&quot;/root/.Xauthority&quot; --network host 192.168.0.10:5000/glxgears BINGO !!!!! it does serve openGL/glxgears in service mode. However, there are a few issues: constraint to manager node require host network the X11-unix and Xauthority are from X11 configuration, which need more study. also network parameter need to expand to ingress overlay mostly, vulkan image still can’t run, with the same error: Cannot find a compatible Vulkan installable client driver (ICD) generic-resource support discussionmoby issue 33439: add support for swarmkit generic resources how to advertise Generic Resources(republish generic resources) how to request Generic Resources nvidia-docker issue 141: support for swarm mode in Docker 1.12 docker issue 5416: Add Generic Resources Generic resources Generic resources are a way to select the kind of nodes your task can land on. In a swarm cluster, nodes can advertise Generic resources as discrete values or as named values such as SSD=3 or GPU=UID1, GPU=UID2. The Generic resources on a service allows you to request for a number of these Generic resources advertised by swarm nodes and have your tasks land on nodes with enough available resources to statisfy your request. If you request Named Generic resource(s), the resources selected are exposed in your container through the use of environment variables. E.g: DOCKER_RESOURCE_GPU=UID1,UID2 You can only set the generic_resources resources’ reservations field. overstack: schedule a container with swarm using GPU memory as a constraint label swarm nodes $ docker node update --label-add &lt;key&gt;=&lt;value&gt; &lt;node-id&gt; compose issue #6691 docker-nvidia issue #141 SwarmKitswarmkit also support GenericResource, please check design doc 1234$ # Single resource$ swarmctl service create --name nginx --image nginx:latest --generic-resources &quot;banana=2&quot;$ # Multiple resources$ swarmctl service create --name nginx --image nginx:latest --generic-resources &quot;banana=2,apple=3&quot; ./bin/swarmctl service create --device /dev/nvidia-uvm --device /dev/nvidiactl --device /dev/nvidia0 --bind /var/lib/nvidia-docker/volumes/nvidia_driver/367.35:/usr/local/nvidia --image nvidia/digits:4.0 --name digits swarmkit add support devices option refermanage swarm service with config UpCloud: how to configure Docker swarm Docker compose v3 to swarm cluster deploy docker compose services to swarm docker deploy doc alexei-led github Docker ARG, ENV, .env – a complete guide]]></content>
      <tags>
        <tag>lgsvl</tag>
        <tag>Docker</tag>
      </tags>
  </entry>
  <entry>
    <title><![CDATA[deploy lgsvl in docker swarm]]></title>
    <url>%2F2019%2F11%2F19%2Fdeploy-lgsvl-in-docker-swarm%2F</url>
    <content type="text"><![CDATA[Backgroundpreviously, vulkan in docker gives the way to run vulkan based apps in Docker; this post is about how to deploy a GPU-based app in docker swarm. Docker swarm has the ability to deploy apps(service) in scalability. Docker registryDocker Registry is acted as a local Docker Hub, so the nodes in the LAN can share images. update docker daemon with insecure-registries modify /etc/docker/daemon.json in worker node: &quot;insecure-registries&quot;: [&quot;192.168.0.10:5000&quot;] systemctl restart docker start registry service in manager node docker service create –name registry –publish published=5000,target=5000 registry:2 access docker registry on both manager node and worker node : $ curl http://192.168.0.10:5000/v2/ #on manager node $ curl http://192.168.0.10:5000/v2/ #on worker node insecure registry is only for test; for product, it has to with secure connection, check the official doc about deploy a registry server upload images to this local registry hubdocker tag stackdemo 192.168.0.10:5000/stackdemo docker push 192.168.0.10:5000/stackdemo:latest curl http://192.168.0.10:5000/v2/_catalog on worker run: docker pull 192.168.0.10:5000/stackdemo docker run -p 8000:8000 192.168.0.10:5000/stackdemo the purpose of local registry is to build a local docker image file server, to share in the cluster server. Deploy composedocker-compose builddocker-compose build is used to build the images. docker-compose up will run the image, if not exiting, will build the image first. for lgsvl app, the running has a few parameters, so directly run docker-compose up will report no protocol error. run vkcube in docker-composedocker-compose v2 does support runtime=nvidia, by appending the following to /etc/docker/daemon.json: 123456"runtimes": &#123; "nvidia": &#123; "path": "nvidia-container-runtime", "runtimeArgs": [] &#125; &#125; to run vkcube in compose by: xhost +si:localuser:root docker-compose up the docker-compose.yml is : 12345678910111213version: '2.3'services: vkcube-test: runtime: nvidia volumes: - /tmp/.X11-unix:/tmp/.X11-unix environment: - NVIDIA_VISIBLE_DEVICES=0 - DISPLAY# image: nvidia/cuda:9.0-base image: vkcube# build: . however, currently composev3 doesn’t support NVIDIA runtime, who is required to run stack deploy. support v3 compose with nvidia runtimeas discussed at #issue: support for NVIDIA GPUs under docker compose: 123456789services: my_app: deploy: resources: reservations: generic_resources: - discrete_resource_spec: kind: 'gpu' value: 2 update daemon.json with node-generic-resources, an official sample of compose resource can be reviewed. but so far, it only reports error: ERROR: The Compose file &apos;./docker-compose.yml&apos; is invalid because: services.nvidia-smi-test.deploy.resources.reservations value Additional properties are not allowed (&apos;generic_resources&apos; was unexpected` deploy compose_V3 to swarmdocker compose v3 has two run options, if triggered by docker-compose up, it is in standalone mode, will all services in the stack is host in current node; if triggered through docker stack deploy and current node is the manager of the swarm cluster, the services will be hosted in the swarm. btw, docker compose v2 only support standalone mode. take an example from the official doc: deploy a stack to swarm: 123456789docker service create --name registry --publish published=5000,target=5000 registry:2docker-compose up -ddocker-compose psdocker-compose down --volumesdocker-compose push #push to local registrydocker stack deploydocker stack services stackdemodocker stack rm stackdemodocker service rm registry after deploy stackdemo in swarm, check on both manager node and worker node: curl http://192.168.0.13:8000 curl http://192.168.0.10:8000 docker service runtimedocker run can support runtime env through -e in CLI or env-file, but actually docker service doesn’t have runtime env support. docker compose v3 give the possiblity to configure the runtime env and deploy the service to clusters, but so far v3 compose doesn’t support runtime=nvidia, so not helpful. I tried to run vkcube, lgsvl with docker service: docker service create --name vkcc --env NVIDIA_VISIBLE_DEVICES=0 --env DISPLAY=unix:$DISPLAY --mount src=&quot;/.X11-unix&quot;,dst=&quot;/tmp/.X11-unix&quot; vkcube docker service create --name lgsvl -p 8080:8080 --env NVIDIA_VISIBLE_DEVICES=0 --env DISPLAY=unix$DISPLAY --mount src=&quot;X11-unix&quot;,dst=&quot;/tmp/.X11-unix&quot; lgsvl for vkcube, the service converged, but no GUI display; for lgsvl, the service failed. Docker deploydocker deploy is used to deploy a complete application stack to the swarm, which accepts the stack application in compose file, docker depoy is in experimental, which can be trigger in /etc/docker/daemon.json, check to enable experimental features a sample from jianshu docker-compose.yml: 1234567891011121314151617181920212223242526272829303132version: "3"services: nginx: image: nginx:alpine ports: - 80:80 deploy: mode: replicated replicas: 4 visualizer: image: dockersamples/visualizer ports: - "9001:8080" volumes: - "/var/run/docker.sock:/var/run/docker.sock" deploy: replicas: 1 placement: constraints: [node.role == manager] portainer: image: portainer/portainer ports: - "9000:9000" volumes: - "/var/run/docker.sock:/var/run/docker.sock" deploy: replicas: 1 placement: constraints: [node.role == manager] a few commands to look into swarm services: 12345docker stack deploy -c docker-compose.yml stack-demo docker stack services stack-demo docker service inspect --pretty stack-demo # inspect service in the swarmdocker service ps &lt;service-id&gt; # check which nodes are running the servicedocker ps #on the special node where the task is running, to see details about the container summaryat this moment, it’s not possible to use v3 compose.yml to support runtime=nvidia, so using v3 compose.yml to depoly a gpu-based service in swarm is blocked. the nature swarm way maybe the right solution. refer run as an insecure registry https configure for docker registry in LAN a docker proxy for your LAN alex: deploy compose(v3) to swarm monitor docker swarm docker swarm visulizer swarm mode with docker service inspect a service on the swarm voting example enable compose for nvidia-docker nvidia-docker-compose compose issue: to support nvidia under Docker compose potential solution for composev3 with runtime swarmkit: generic_resources Docker ARG, ENV, .env – a complete guide]]></content>
      <tags>
        <tag>Docker</tag>
        <tag>lg-sim</tag>
      </tags>
  </entry>
  <entry>
    <title><![CDATA[vulkan in docker to support new lgsvl]]></title>
    <url>%2F2019%2F11%2F13%2Fvulkan-in-docker-to-support-new-lgsvl%2F</url>
    <content type="text"><![CDATA[backgroundDocker is a great idea to package apps, the first time to try play with docker swarm. lg-sim has updated to HDRP rendering, which has a higher quality, as well requires more GPU features, Vulkan. currently Vulkan is not supported by standard docker neither nvidia-dcker, which is deprecated after docker enginer &gt; 19.03. there is nvidia images, the special one we are interesed is vulkan docker, and there is an related personal project, which is based on the cudagl=10.1, which is not supported by non-Tesla GPU. so for our platform, which has only Quadra P2000 GPUs, the supported CUDA is 9.0, so we need to rebuild the vulkan docker based on CUDA9.0. check the vulkan dockerfile, instead of using cudagl:9.0, change to: FROM nvidia/cudagl:9.0-base-ubuntu16.04 after build the image, we can build the vulkan test samples. if no issue, load lg-sim into this vulkan-docker. a few lines may help: 12345/usr/lib/nvidia-384/libGLX_nvidia.s.0 /usr/share/vulkan/icd.d/proc/driver/nvidia/version new lgsvl in dockerthe previous lg-sim(2019.04) can be easily run in docker, as mentioned here. the above vulkan-docker image is the base to host lgsvl (2019.09). additionaly, adding vulkan_pso_cache.bin to the docker. the benefit of host lgsvl server in docker, is to access the webUI from host or remote. so the network should be configured to run as --net=host. if configure as a swarm overlay network, it should support swarm cluster. a few related issue can be checked at lgsvl issues hub. VOLUME in dockerfilethe following sample is from understand VOLUME instruction in Dockerfile create a Dockerfile as: 123FROM openjdkVOLUME vol1 /vol2CMD [&quot;/bin/bash&quot;] 12docker build -t vol_test . docker run --rm -it vol_test check in the container, vol1, vol2 does both exist in the running container. 123bash-4.2# ls bin dev home lib64 mnt proc run srv tmp var vol2 boot etc lib media opt root sbin sys usr vol1 also check in host terminal: 1234root@ubuntu:~# docker volume ls DRIVER VOLUME NAMElocal 0ffca0474fe0d2bf8911fba9cd6b5875e51abe172f6a4b3eb5fd8b784e59ee76local 7c03d43aaa018a8fb031ef8ed809d30f025478ef6a64aa87b87b224b83901445 and check further: 123root@ubuntu:/var/lib/docker/volumes# ls 0ffca0474fe0d2bf8911fba9cd6b5875e51abe172f6a4b3eb5fd8b784e59ee76 metadata.db7c03d43aaa018a8fb031ef8ed809d30f025478ef6a64aa87b87b224b83901445 once touch ass_file under container /vol1, we can find immediately in host machine at /var/lib/docker/volumes : 1234root@ubuntu:/var/lib/docker/volumes/0ffca0474fe0d2bf8911fba9cd6b5875e51abe172f6a4b3eb5fd8b784e59ee76/_data# ls -lt total 0-rw-r--r-- 1 root root 0 Nov 7 11:40 css_file-rw-r--r-- 1 root root 0 Nov 7 11:40 ass_file also if deleted file from host machine, it equally delete from the runnning container. The _data folder is also referred to as a mount point. Exit out from the container and list the volumes on the host. They are gone. We used the –rm flag when running the container and this option effectively wipes out not just the container on exit, but also the volumes. sync localhost folder to containerby default, Dockerfile can not map to a host path, when trying to bring files in from the host to the container during runtime. namely, The Dockerfile can only specify the destination of the volume. for example, we expect to sync a localshost folder e.g. attach_me to container, by cd /path/to/dockfile &amp;&amp; docker run -v /attache_me -it vol_test. a new data volume named attach_me is, just like the other /vol1, /vol2 located in the container, but this one is totally nothing to do with the localhost folder. while a trick can do the sync: 1docker run -it -v $(pwd)/attach_me:/attach_me vol_test Both sides of the : character expects an absolute path. Left side being an absolute path on the host machine, right side being an absolute path inside the container. volumes in composewhich is only works during compose build, and has nothing to do with docker container. copy folder from host to container COPY in dockerfile ERROR: Service ‘lg-run’ failed to build: COPY failed: stat /var/lib/docker/tmp/docker-builder322528355/home/wubantu/zj/simulator201909/build: no such file or directory the solution is to keep the folder in Dockerfile’s current pwd; if else, Docker engine will look from /var/lib/docker/tmp. VOLUME summaryIf you do not provide a volume in your run command, or compose file, the only option for docker is to create an anonymous volume. This is a local named volume with a long unique id for the name and no other indication for why it was created or what data it contains. If you override the volume, pointing to a named or host volume, your data will go there instead. when VOLUME in DOCKERFILE, it actually has nothing to do with current host path, it actually generate something in host machine, located at /var/lib/docker/volumes/, which is nonreadable and managed by Docker Engine. also don’t forget to use --rm, which will delete the attached volumes in host when the container exit. warning: VOLUME breaks things understand docker-compose.ymlUnderstand and manage Docker container volumes what is vulkan SDK Graham blog]]></content>
      <tags>
        <tag>lgsvl</tag>
        <tag>Docker</tag>
      </tags>
  </entry>
  <entry>
    <title><![CDATA[build cluster on Docker swarm]]></title>
    <url>%2F2019%2F11%2F06%2Fbuild-cluster-on-Docker-swarm%2F</url>
    <content type="text"><![CDATA[Docker micro serviceskey concepts in Docker when deploy an application(lg-sim) to swarm cluster as a service, which is defined in a/the manager node, the manager node will dispatch units of work as taskes to worker nodes. when create a service, you specify which container image to use and which commands to execute inside runing containers. In the replicated services, the swarm manager distributes a specific number of replica tasks among the nodes based upon the scale you set in the desired state. For global services, the swarm runs one task for the service on every available node in the cluster. Docker swarm CLI commands1234567891011121314151617181920docker swarm init docker swarm joindocker service create --name --env --workdir --userdocker service inspect docker service ls docker service rmdocker service scale docker service psdocker service update --argsdocker node inspect docker node update --label-add docker node promote/demote # run in workerdocker swarm leave# run in managerdocker node rm worker-nodedocker ps #get running container IDdocker exec -it containerID /bin/bashdocker run -itdocker-compose up build/run delete unused Docker networkas Docker network may confuse external when access to the local network interfaces, sometimes need to remove the docker networks. 1234567docker network lsdocker network disconnect -f &#123;network&#125; &#123;endpoint-name&#125;docker network rm docker stop $(docker ps -a -q)docker rm $(docker ps -a -q)docker volume prune docker network prune the above scripts will delete the unused(non-active) docker network, then may still active docker related networks, which can be deleted through: 12345678910111213sudo ip link del docker0``` ### access Docker service Docker container has its own virutal IP(172.17.0.1) and port(2347), which allowed to access in the host machine; for externaly access, need to map the hostmachine IP to docker container, by `--publish-add`. the internal communication among docker nodes are configured by `advertise_addr` and `listen-addr`.#### through IP externalyTo publish a service’s ports externally, use the --publish &lt;PUBLISHED-PORT&gt;:&lt;SERVICE-PORT&gt; flag. When a user or process connects to a service, any worker node running a service task may respond.taking example from [5mins series](https://www.cnblogs.com/CloudMan6/tag/Swarm/) docker service create –name web_server –replicas=2 httpddocker service ps web_server access service only on host machine through the Docker IPcurl 172.17.0.1docker service update –publish-add 8080:80 web_server access service externallycurl http://hostmachineIP:80801234567#### configure websocket protocolfor lg-sim server to pythonAPI client, which is communicated through `websocket`, it&apos;s better if the service can be configured to publish through websocket.#### publish httpd server to swarm service docker service create –name web_server –publish 880:80 –replicas=2 httpd1the container IP is IP in network interface `docker0`(e.g. 172.17.0.1), which can be checked through `ifconfig`. `80` is the default port used by httpd, which is mapped to the host machine `880` port. so any of the following will check correctly: curl 172.17.0.1:880curl localhost:880curl 192.168.0.1:880curl 192.168.0.13:880 #the other docker node1234567891011121314151617181920 #### publish lg-sim into swarm service the previous version(2019.04) of lg-sim doesn&apos;t have a http server built-in, since 2019.7, they have `Nancy http server`, which is a great step toward dockerlize the lg-sim server.### manage data in Docker `Volumes` are stored in a part of the host filesystem, which is located `/var/lib/docker/volumes`, which is actually managed by Docker, rather than by host machine.Volumes are the preferred way to [persist data in Docker](https://docs.docker.com/v17.09/engine/admin/volumes/#more-details-about-mount-types) containers and services. some use cases of volume:* once created, even the container stops or removed, the volume still exist.* multiple containes can mount the same voume simultaneously; * when need to store data on a remote host * when need to backup, restore, or migrate data from one Dockr to another#### RexRayan even high-robost way is to separate volume manager and storge provider manager. [Rex-Ray](https://rexray.readthedocs.io/en/v0.3.3/) docker service create –name web_s \ –publish 8080:80 \ –mount “type=volume, volume-driver=rexray, source=web_data, target=/usr/local/apache2/htdocs” \ httpd docker exec -it containerIDls -ld /usr/local/apahce2/htdocschown www-data:www-data test visitcurl http://192.168.0.1:8080docker inspect containerID ``` source reprensents the name of data volume, if null, will create newtarget reprensents data volume will be mounted to /usr/local/apache2/htdocs in each container in RexRay, data volume update, scale, failover(when any node crashed, the data volume won’t lose) also be taken care. refer5mins in Docker Docker swarm in and out what is swarm advertise-addr can ran exec in swarm execute a command within docker swarm service]]></content>
      <tags>
        <tag>Docker</tag>
      </tags>
  </entry>
  <entry>
    <title><![CDATA[Linux network tool]]></title>
    <url>%2F2019%2F11%2F05%2FLinux-network-tool%2F</url>
    <content type="text"><![CDATA[Linux network commandsipip command is used to edit and display the configuration of network interfaces, routing, and tunnels. On many Linux systems, it replaces the deprecated ifconfig command. 123456789ip link del docker0 # delete a virtual network interface ip addr add 192.168.0.1 dev eth1 #assign IP to a specific interface(eht1)ip addr show #check network interface ip addr del 192.168.0.1 dev eth1 ip link set eth1 up/downip route [show] ip route add 192.168.0.1 via 10.10.20.0 dev eth0 #add static route 192.168.0.1ip route del 192.168.0.1 #remove static routeip route add default via 192.168.0.1 # add default gw netstatenetstate used to display active sockets/ports for each protocol (tcp/ip) 12netstat -lat netstat -us nmclinmcli is a Gnome command tool to control NetworkManager and report network status: 12nmcli device status nmcli connection show = routeroute 1234route ==&gt; ip route (modern version) ##print router route add -net sample-net gw 192.168.0.1 route del -net link-local netmask 255.255.0.0 #delete a virtual network interface ip route flush # flashing routing table tracepathtracepath is used to traces path to a network host discovering MTU along this path. a modern version is traceroute. 1234tracepath 192.168.0.1 ``` ### networking service systemctl restart networking/etc/init.d/networking restart orservice NetworkManager stop123456789## network interface location at `/etc/network/interfaces` `eno1` is onboard Ethernet(wired) adapter. if machines has already `eth1` in its config file, for the second adapter, it will use `eno1` rather than using `eth2`.[ifconfig](https://www.ibm.com/support/knowledgecenter/ssw_aix_71/i_commands/ifconfig.html) is used to set up network interfaces such as Loopback, Ethernet network interface: a software interface to networking hardware, e.g. physical or virtual. physical interface, such as `eth0`, namely Ethernet network card. virtual interface such as `Loopback`, `bridges`, `VLANs` e.t.c. ifconfig -aifconfig eth0 #check specific network interfaceifconfig eth0 192.168.0.1 #assign static IP address to network interfaceifconfig eth0 netmask 255.255.0.0 #assign netmask ifconfig docker0 down/up 1234567891011121314151617181920212223242526replaced by `ip` command later.### why enp4s0f2 instead of eth0[change back to eth0](https://www.itzgeek.com/how-tos/mini-howtos/change-default-network-name-ens33-to-old-eth0-on-ubuntu-16-04.html)```shelllspci | grep -i &quot;net&quot;dmesg | grep -i eth0ip a sudo vi /etc/default/grub GRUB_CMDLINELINUX=&quot;net.ifnames=0 biosdevname=0&quot;update-grub# update /etc/network/interfaces auto eth0iface eth0 inet static sudo reboot Unknown interface enp4s0f2due TO /etc/network/interfaces has auto enp4s0f2 line, which always create this network interface , when restart the networking service. ping hostname with docker0usually there may be have multi network interfaces(eth0, docker0, bridge..) on a remote host, when ping this remote host with exisiting docker network (docker0), by default will talk to the docker0, which may not the desired one. build LAN cluster with office PCs setup PC IPs 1234567891011121314151617181920master node: IP Address: 192.168.0.1 netmask: 24Gateway: nullDNS serer: 10.3.101.101 worker node:IP address: 192.168.0.12netmask: 24Gateway: 192.168.0.1DNS: 10.255.18.3``` * `ufw disable` * update `/etc/hosts` file: 192.168.0.1 master 192.168.0.12 worker if need to update the default hostname to `worker`, can modify `/etc/hostname` file, and reboot. * ping test : ```script ping -c 3 master ping -c 3 worker set ssh access(optionally) sudo apt-get install openssh-server ssh-keygen -t rsa # use custmized key cp rsa_pub.key authorized_key referUbuntu add static route 10 useful IP commands]]></content>
      <tags>
        <tag>Linux</tag>
        <tag>network</tag>
      </tags>
  </entry>
  <entry>
    <title><![CDATA[未来5年在哪里11]]></title>
    <url>%2F2019%2F10%2F29%2F%E6%9C%AA%E6%9D%A55%E5%B9%B4%E5%9C%A8%E5%93%AA%E9%87%8C11%2F</url>
    <content type="text"><![CDATA[真实一直有隐藏的一面，而那些窥视过隐藏面的人，注定于大多数人不同，光鲜或者孤独。想想不甘心做韭菜又上升无望的漫漫岁月，简直注定了人生悲剧。 脆弱的感情自以为是个单纯的人，所以喜欢工程师氛围、高校科研氛围、美国的社区文化。回到国内，这些氛围都找不到了。扑面而来的气息，显得很黑暗森林。我怀疑一个单纯的人，在这样的环境下，如何能生存。所以，花尽心思去思考怎么顺应社会。话语中，充满了数字和术语，像一个机器在聊天，而不是一个有感情的人，女生都没办法接近我这样的人。 谈女生总会到一个话题，你稳定了吗。这个”未来5年“系列已经过了一年了，换环境、换公司、换小方向，到底有什么本质的变化，确定了一个方向，确定了一些问题吗？实际上都没有，甚至还没清楚自己要在哪里（国家、城市），做什么行业（汽车、咨询），都是一些自定义的限制。 小城市的行业回来的火车上，遇见了小城的公务员一家去武汉旅游。他们在当地的审计局工作，每年有一些培训在全国各地，比如，南京、武汉。然后一年还有5～10天年薪假。所以一年会有一两次国内的旅行，长一个月，短半个月的。他们的工资在5千 ～ 6千，小城市买了房子，买了10几万的德国/日本车；单位同事，家底好的可以开30-40万的abb。 体系内 学校、医院、法务机关、警察、税务、工商、建设、银行、电力、通信、交通、烟草、农林牧渔等等，庞大的网络可以解决三四线城市大量优质年轻人就业，因为是政府机关，在这些组织里面的年轻人，自然是党国的忠实跟随者。他们拿着高于当地平均值的工资，足够在小城市活的衣食无忧，甚至如上面的小夫妻，有车有房，有一年两次的度假。相比，大城市里租房、加班、年假等于零的奋斗青年，简直是苦逼。 体系外 三四线城市，（政府）体系以外的行业典型： 美容养生、学生教育、餐饮、休闲娱乐（电影、健身、文化项目）、汽修、服装、家具，是属于可以创业的行当。江浙以外，很少有三四线城市有自己的传统支柱产业，比如，农产品加工、能源石化、机械加工、小商品生产等等；更鲜有高附加值支柱产业，诸如，电子器件产业群、芯片产业群、汽车产业群、生物制药产业群、it产业群、精密制造产业群、金融服务业等等。 小城就一家国有钢铁厂，因为北方的能源、钢铁产能过剩，也气数不多。在这些小城市发展高附加值的产业，相当于空中楼阁。所以，体系外的年轻人生存状态其实很空心化。 空心化 “空心化”不是对服务业的贬低，但是相比发达欧美国家的小城市，它们会有一些其貌不扬，但历史百年的世界级的公司和产品。比如，德国狼堡，世界汽车产业中心；美国密西根奥本山小镇，世界级汽车应用软件商聚集地。生活在这些地方的年轻人，当然可以选择进入当地的服务业，诸如快餐连锁、超市服装、甚至房租装修、水电工等等。但是对于做研究、懂技术的年轻人，他们也能找到很容易进入一个创造绝对价值的公司。国内三四线小城市的“空心化”，就是除了传统的服务行业以及政府体系可以吸纳年轻人之外，缺乏生产绝对价值的支柱公司。 倒逼年轻人只能进入传统服务业或者政府机构，能够发挥年轻人创作力的机会太少了。服务业是景上添花，没有高附加值制造业的世界地位，服务业只会被局限在传统的衣食住行。这样一个显著的问题，就是影响了年轻人的价值体系。研究生毕业以后，收入水平、价值实现竟然不如隔壁小学毕业开早点的王老二。那国家投入这么多教育，是为了让这些年轻人去自卑吗。 结局 愚民之策，老百姓根本看不到问题，实际上在短期内，服务业是滞后价值创造产业的。开早点的王老二，还可以一个月赚5万，而且不需要为一个大学毕业生买单。长远点，低端/传统服务业，比如餐厅、修车店、服装店会慢慢饱和，甚至开展恶性竞争。每家新开的餐厅，头两个月可以吸引顾客，之后就被隔壁新开的餐厅带流量了。 一旦大部分传统服务行业进入这个阶段，除了政府直接控制的体系内，老百姓可能就受不了。要么政府强行维稳，全国陷入“中等收入陷阱”。要么社会就会开始乱。 中国有大量的三四线城市，可能都面对“空心化”的问题。大概率进入“中等收入陷阱”。看到问题而又无奈的人，会选择在这场演化之前逃离。国家层面的软实力，不是一天能建立起来的。这恐怕也是中国虽然有着表面的gdp数据，但实际上社会系统相比成熟的发达国家还差很远的地方。如果在演进之前，能慢慢发展出好的系统话。 自动驾驶失业潮有预测，到2030年，l3+自动驾驶的行业渗透可能只在10%。相比普遍的乐观预测，到2020年以后，l3+前装量产。国内近来发了一些很政策导向的产业牌照：全球第一张5G商用牌照， 全球第一张自动驾驶商用牌照。相比，苹果在2020年都不会上5G，美国撤销自动驾驶交通委员会。到底是国内领先，还是国家在画大饼？要保障这么多新生劳动力进入市场，国家也很南啊。 政策可以画行业大饼，但市场比政策更理性。waymo已经被资本市场降低估值，应该就是市场对这个行业的真实回应。按照资本的延后性，大概这个行业的裁员潮会尾随而来。 如果失业了，新职业选择在哪里？变化是永恒的。越是高级的打工仔，受裁员影响越高。在互联网级的快速迭代市场，专业的深度也许不如专业的可移植性对打工仔更合适。 进军服务业看《中国机长》又一次加深的这样的体会。飞机设计、航空发动机、自动驾驶飞控系统是远在普通人的关心之外的；相比，机长、空姐、机场运营人员、服务人员容纳着大量就业，也最直接产生价值。而且他们服务产生的价值，也是经济活力的主要贡献和决定者。相比，虽然国家没有自主的民航发动机、没有自主的cae软件，谁又觉得有什么问题呢。不过是一小群人为此担忧，大部分百姓甚至政策决策者只需要看到中国的航运数据又上升了，航运公司对各地gdp的贡献又大了，就可以士气高涨。 离客户越近，价值越大。在核心技术没有先发优势的市场，做技术只是backup的需求。相比，服务业更靠近广大终端客户人群，价值传递最短、损耗最小、获利最大。所以，空心化后的精英都会去高附加值服务业，比如，金融、互联网。 我们确实不该操政策精英的心，相比，做一个短视的普通人，识时务，进挣钱的行业，就是普通人该干的。 所以，还是要好好准备mba。]]></content>
  </entry>
  <entry>
    <title><![CDATA[未来5年在哪里 12]]></title>
    <url>%2F2019%2F10%2F29%2F%E6%9C%AA%E6%9D%A55%E5%B9%B4%E5%9C%A8%E5%93%AA%E9%87%8C-12%2F</url>
    <content type="text"><![CDATA[国内工作半年了，先后对开源的仿真软件做了定制化实现，对高精地图、ros、场景定义等有了大量快速的接触。同时也搭建了团队的gitlab, jenkins服务, rosbag管理服务、webviz服务，pg地图服务等等，并且也投入了一些时间去搭建数据基础框架，但并没有实际业务需求。在传统oem，用DevOps去规范开发流程，以及探索数据业务、云服务等相对较新颖。相对，新兴的造车企业，会比较好的融合DevOps流程，开发上云。 本质上是开发ADS支撑工具及数据基础设施，但由于OEM内部的ads开发团队比较弱，这些支撑工具以及数据工具并不发挥价值。作为工具开发者，存在感会变弱。因为离核心业务较远，虽然技术很通用。 OEM’s role in emerging techOEM itself mostly doesn’t build tools itself, even like Ford, who hired a great team to develop CAE tools in early days and build them tremendous, still retired most of them or sell them as packages to pure software vendors later. on the other side, the pioneers or leaders of the industry, usually build and apply new methodlogy before the market is there. and after the market is there, they can have the professional team to take in charge. the leaders take the role as incubators, as many Internet startups got the ideas from giants e.g. Google, Facebook e.t.c ADS at this moment is the emerging tech, there is no mature solution or tool chain for ADS, so the leaders of automobile makers and suppliers are the guys who should take the lead to set up standards and tools till ADS market goes to mature. and the giants in automobile should catch this great opportunity to take control in long term. the problems of Chinese OEMs (sadly there is no strong Chinese suppliers yet), they are customed to be followers, when US or German auto makers do the research and make the new bussiness stratagey, Chinese OEMs will follow. as the blog said, Chinese has no friend, neither Chinese goverment. be the followers or imitators is minimal loss strategy for most Chinese companies as well. and this is also why west coutry companies are afriad of the competition from Chinese. Chinese take copy others ideas as a strategy to success, rather than a shame. a tool developer机缘巧合，职业最初，就属于工具支撑组，开发和维护工具。随着，ads的爆发，more and more software skills are required by tradititional automobile makers and suppliers; and they prefer transfer themselves to modern digital driven. but a special tool is never the core product of auto makers. there comes a time, the OEM engineers asked, why you guys build tools, rather than buy from suppliers. The right thing for OEMs is to integrate components, rather than building tools. the cost of building tools by OEM itself is really too much after a while we realized. But at first, OEMs thought it was too costy to buy commecial tools, which gives survival space of a small team to grow inside OEM teams. actually at the early time, small team dreams big, especially in these days, many open source projects can be used freely. the small team almost support every aspect related to software tools, and as well to add features to simulation platform, which is also based on an open source project lg-sim, software manager looks like a good role in the team. but the reality is, to customize any tool, will cost a huge energy for a small team, as we are not involved in these open source projects. more, the manager team doesn’t consider an independent tool team. and even like this, we are moving slowly. infrasturcture for ABCfor ADS startups, they emphasize their infrasturcture features with : cloud platform remote monitor/control system (for vehicle team management) OTA data infrastructure(storage, management, analysis) and in single vehicle features with powerful computing ability, from GTC 2018 speak, most startup use the powerful Xdriver GPU； WeRide Momenta Tusimple AutoX Roadstar.AI (dead) plusAI ponyAI all these looks very Internet, but too far from massive vehicle product. the other mind is these teches are actually common for average engineers, but the chanllenge is to build a team, to realy make it work. survive in OEMit’s maybe more interested to take myself as an ABC(AI, big data, cloud computing) guy to find application scenarios in automobile industry product development. while ABC application scenarios are not matured yet of course, usually it is driven by leader teams, when it comes to drive by market, may be too late. beyond automotive, industry IoT maybe an even general domain to quickly apply ABCs. there should be three steps: familiar with the tech methods understand the hurting points of the industry product and management update -&gt; value return the life leap happens to a few lucky guys. most guys are common and grow in a linear speed. there is a burden the more you know, the harder your life. sadly … digitalizeOEMs are involved more and more in transfering digital, smart. previously most invovled in market, ads strategy, supply chain management; and as connected vehicles, self driving, future mobility service bursting in recent years, digitalization is sinking in vehicle product R&amp;D. AI, big data, cloud, blockchain e.t.c. new techs will be transfering the traditional industry. hopefully we can define the new bussniess together.]]></content>
  </entry>
  <entry>
    <title><![CDATA[lg-sim source code review (2019.09)]]></title>
    <url>%2F2019%2F10%2F22%2Flg-sim-source-code-review-2019-09%2F</url>
    <content type="text"><![CDATA[the recent version of lg-sim 2019.09 has additionally support for webUI, cloud, which is a big enhencement. the following code review is focusing on new components to support WebUI, cluster management. databasedatabase used sqlite project, which is an embedded in-memory db; the ORM used PetaPoco project; there are a few db related serices, e.g. vehicleService, simulationService, which will be used in RESTful web request. DatabaseManager: 12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152static IDatabaseBuildConfiguration DbConfig ;Init()&#123; DbConfig = GetConfig(connectionString) using db = new SqliteConnection(connectionString)&#123; db.Open()&#125;CreateDefaultDbAssets()&#123; using db = Open() if (info.DownloadEnvs != null)&#123; map = new MapModel()&#123;url = url LocalPath=localPath;&#125; id = db.Insert(map) &#125; if (info.DownloadVehicles != null)&#123;&#125; if(defaultMap.HasValue) &#123; sim1 = new SimulationModel()&#123; Cluster=0, Map=defaultMap.Value, ApiOnly=false&#125;; AddSimulation(db, sim1, noBridgeVehicle); AddSimulation(db, sim, apolloVehicle); db.Insert(simx) &#125;&#125;AddSimulation(IDatabase db, sim,vehicle)&#123; conn = new ConnectionModel()&#123; simulation = id, vehicle = vehicle.Value, &#125; db.Insert(conn)&#125;AddVehicle(db, sim,vehicle)&#123; conn = new VehicleModel()&#123; Url = url , &#125; db.Insert(vehicle) return vehicle.Id;&#125;PendingVehicleDownloads()&#123; using db=Open()&#123; var sql = Sql.Builder.From(&quot;vehicles&quot;).Where(&quot;status=00&quot;, &quot;Downloading&quot;); return db.Query&lt;VehicleModel&gt;(sql); &#125;&#125; database ORMsusing PetaPoco to define ORM models, such asusers, sessions, maps, vehicles, clusters, simulations PetaPocoa few operators in PetaPoco, such as Page, used to query a page; Single, used to query a single item; Insert; Delete; Update database provider123SQLiteDatabaseProvider =&gt; GetFactory(&quot;Mono.Data.Sqlite.SqliteFactory, Mono.Data.Sqlit&quot;) called inside DatabaseManager::Init(). database services VehicleService : IVehicleService MapService NotificationService =&gt; NotificationManager.SendNotification() DownloadService =&gt; DownloadManager.AddDownloadToQueue() ClusterService SessionService SimulationService Web modelweb model is where WebUI server located, which is served by Nancy project, in which a new design metholody is applied: Inversin of Control Contaner(IoC), namely 控制反转 in chinese. IoC is used to inject implementation of class, and manage lifecycle, dependencies. web model is the server where talk to webUI through routes, which is defined in the React project. Nancyused to build HTTP based web service, FileStream StreamResponse TinyIoC UnityBootstrapper : DefaultNancyBootstrapper, used to automatic discovery of modules, custm model binders, dependencies. /Assets/Scripts/Web/Nancy/NancyUnityUtils 12345678910111213141516171819202122232425ConfigureApplicatinContainer(TinyIoCContainer container)&#123; container.Register&lt;UserMapper&gt;(); container.Register&lt;IMapService&gt;(); container.Register&lt;IClusterService&gt;(); container.Register&lt;IVehicleService&gt;(); container.Register&lt;ISimulationService&gt;();&#125; ``` ### route modules ```c#SimulationModule : NancyModule()&#123; class SimulationRequest&#123;&#125; class SimulationResponse&#123;&#125; Get(&quot;/&quot;, x=&#123;&#125;) Post(&quot;/&quot;, x=&#123;&#125;) Put(&quot;/&#123;id:long&#125;&quot;, x=&#123;&#125;) Post(&quot;/&#123;id:long&#125;/start&quot;, x=&#123;&#125;) Post(&quot;/&#123;id:long&#125;/stop&quot;, x=&#123;&#125;) &#125; inside some of the route modules will do query or update the sqlite database for special objects. till here is the complete web back-end server, with a http server and an in-memory database. the webUI frontend is based on React, about React check a previous blog. in the React WebUI project will have http requests corresponding to each of routes defined here. React front end is very independent framework, which should be handled by an independent team in standard pipeline. Config &amp; Loader /Assets/Scripts/Web/Config.cs: used to define WebHost, WebPort, ApiHost, ClourUrl, Headless, Sensors, Bridges e.t.c Loader.cs : 12345678910111213141516171819202122232425262728293031323334353637383940Loader Instance ; //Loader object is never destroyed, even between scene reloadsLoader.Start()&#123; DatabaseManager.Init(); var config = new HostConfguration&#123;&#125; ; // ? Server = new NancyHost(Config.WebHost); Server.Start(); DownloadManager.Init();&#125;Loader.StartAsync(simulation)&#123; Instance.Actions.Enqueue() =&gt; var db = DatabaseManager.Open() AssetBundle mapBundle = null; simulation.Status &quot;Starting&quot; NotificationManager.SendNotification(); Instance.LoaderUI.SetLoaderUIState(); Instance.SimConfig = new SimlationConfig()&#123; Clusters, ApiOnly, Headless, Interative, TimeOfDay, UseTraffic... &#125; &#125;Loader.SetupScene(simulation)&#123; var db = DatabaseManager.Open() foreach var agentConfig in Instance.SimConfig.Agents: var bundlePath = agentConfig.AssetBundle var vehicleBundle = AssetBundle.LoadfromFile(bundlePath) var vehicleAssets = vehicleBundle.GetAllAssetNames(); agentConfig.Prefab = vehicleBundle.LoadAsset&lt;GO&gt;(vehicleAssets[0]) var sim = CreateSimulationManager(); Instance.CurrentSimulation = simulation &#125; DownloadManager class Download(); Init(){ client = new WebClient(); ManageDownloads(); } network modulenetwork module is used for communication among master and clients(workers) in the cluster network for cloud support. the P2P network is built on LiteNetLib which is a reliable UDP lib. LiteNetLibLiteNetLibis Udp package, used to P2P communication among master node and slave nodes here, where are defined as MasterManager, ClientManager. usage of LiteNetLib can be found in the wiki-usage)]]></content>
      <tags>
        <tag>lgsvl</tag>
        <tag>network</tag>
        <tag>database</tag>
      </tags>
  </entry>
  <entry>
    <title><![CDATA[IoC in C#]]></title>
    <url>%2F2019%2F10%2F22%2FIoC-in-C%2F</url>
    <content type="text"><![CDATA[Inversion of Control(IoC) is a design mechanism to decouple components dependencies, a light-weight implementation is: TinyIoC, which is also part of Nancy. IoC idea uses commonly in webUI(and backend server) apps, which is an user friendly solution to cloud deployment management as well as apps in mobile, which should be the right direction in future ADS software tool development. the idea of IoC can be explained by the following example from register and resolve in unity container 12345678910111213141516171819202122232425262728293031323334353637383940public interface ICar&#123; int Run();&#125;public class BMW : ICar&#123; private int _miles = 0; public int Run() &#123; return ++_miles; &#125;&#125;public class Ford : ICar&#123; private int _miles = 0; public int Run() &#123; return ++_miles; &#125;&#125;public class Driver&#123; private ICar _car = null; public Driver(ICar car) &#123; _car = car; &#125; public void RunCar() &#123; Console.WriteLine(&quot;Running &#123;0&#125; - &#123;1&#125; mile &quot;, _car.GetType().Name, _car.Run()); &#125;&#125; the Driver class depends on ICar interface. so when instantiate the Driver class object, need to pass an instance of ICar, e.g. BMW, Ford as following: 123456789101112Driver driver = new Driver(new Ford());driver.RunCar()``` **to use IoC**, taking UnityContainer framework as example, a few other choices: TinyIoC e.t.c.```c# var container = new UnityContainer(); Register create an object of the BMW class and inject it through a constructor whenever you need to inject an ojbect of ICar. 12container.Register&lt;ICar, BMW&gt;(); Resolve Resolve will create an object of the Driver class by automatically creating and njecting a BMW object in it, since previously register BMW type with ICar. Driver drv = container.Resolve&lt;Driver&gt;(); drv.RunCar() summarythere are two obvious advantages with IoC. the instantiate of dependent class can be done in run time, rather during compile. e.g. ICar class doesn’t instantiate in Driver definition automatic new class management in back. the power of IoC will be scaled, once the main app is depends on many little services.]]></content>
      <tags>
        <tag>C#</tag>
      </tags>
  </entry>
  <entry>
    <title><![CDATA[rosbag tools in ADS]]></title>
    <url>%2F2019%2F10%2F21%2Frosbag-tools-in-ADS%2F</url>
    <content type="text"><![CDATA[backgroundin ADS, data fusion, sensor performance, L3+ perception, localization algorithms development relys a lot on physicall data collection, commonly in rosbag format with information/data about gps, rtk, camera, Lidar, radar e.t.c. to build up the development process systemly is a critial thing, but also ignored by most ADS teams. for large OEMs, each section may have their own test vehicles, e.g. data fusion team, sensor team e.t.c, but few of them take care data systematically, or build a solution to manage data. one reason is the engineers are lack of ability to give software tool feedbacks/requirements, so they still stay and survive with folders or Excel management, which is definitely not acceptable and scalable for massive product team. thanks for ROS open source community conributing a great rosbag database manage tool: bag_database. with docker installation, this tool is really easy to configure. a few tips: web server IP port in Docker can be accessed from LAN by parameter -p during docker run. mount sbm driveas mentioned, most already collected rosbag is stored in a share drive, one way is to mount these data. 1234sudo mount -t cifs //share_ip_address/ROS_data /home/david/repo/rosbag_manager/data -o uid=david -o gid=david -o credentials=/home/david/repo/rosbag_manager/data/.pwd sudo umount -t cifs /home/david/repo/rosbag_manager/data Tomcat server configurebag_database is hosted by Tomcat, the default port is 8080. For our services, which already host pgAdmin4 for map group; gitlab for team work; xml server for system engineering; for webviz. check the port is occupied or not: 1netstat -an | grep 8088 so configure /usr/loca/tomcat/conf/server.xml: 12345 &lt;Service name="Catalina"&gt; &lt;Connector port="your_special_port" ...&lt;/Service&gt; a few other toolsros_hadoopros_hadoop is a rosbag analysis tool based on hdfs data management. which is a more scalable platform for massive ADS data requirements. if there is massive ros bag process in need, ros_hadoop should be a great tool. there is a discussion in ros wiki install hadoopapache hadoop download single alone install concept in hdfs namenode daemon process, used to manage file system datanode used to data block store and query secondary namenode used to backup hdfs shell command hdfs path URL hdfs-site.xml/usr/local/hadoop/etc/hadoop/hdfs-site.xml configure file dfs.datanode.data.dir -&gt; local file system where to store data blocks on DataNodes dfs.replicaiton -&gt; num of replicated datablocks for protecting data dfs.namenode.https-address -&gt; location for NameNode URL dfs.https.port -&gt; copy local data into hdfs hdfs dfs -put /your/local/file/or/folder [hdfs default data dir] hdfs dfs -ls mongodb_storemongodb_store is a tool to store and analysis ROS systems. also in ros wiki mongo_rosmongo_ros used to store ROS message n MongoDB, with C++ and Python clients. mongo-hadoopmongo-hadoop allows MongoDB to be used as an input source or output destination for Hadoop taskes. ros_pandasrosbag_pandas tabblestabbles used to tag any rosbags or folders. hdfs_fdwhdfs for postSQL at the endtalked with a friend from DiDi software team, most big Internet companies have their own software tool teams in house, which as I know so far, doesn’t exist in any traditional OEMs. is there a need for tool team in OEMs? the common sense is at the early stage, there is no need to develop and maintain in-house tools, the commericial ones should be more efficient; as the department grows bigger and requires more user special development and commericial tools doesn’t meet the needs any more, tool teams may come out. still most decided by the industry culture, OEM’s needs is often pre-defined by big suppliers, so before OEMs are clear their software tool requirements/need, the suppliers already have the solutions there – this situation is epecially true for Chinese OEMs, as their steps is behind Europen suppliers maybe 50 years.\ I am interested at the bussiness model of autovia.ai, which focus on distributed machine learning and sensor data analytics in cloud with petabyte of data, with the following skills: large scale sensor data(rosbag) processing with Apache Spark large scale sensro data(rosbag) training with TensorFlow parallel processing with fast serialization between nodes and clusters hdmap generation tool in cloud metrics and visulization in web loading data directly from hdfs, Amazon S3 all these functions will be a neccessary for a full-stack ADS team in future to development safety products, which I called “ data infrastructure for ADS”. referMooreMike: ros analysis in Jupter mount smb share drive to ubuntu autovia.ai]]></content>
      <tags>
        <tag>ros</tag>
      </tags>
  </entry>
  <entry>
    <title><![CDATA[safety guy in AV]]></title>
    <url>%2F2019%2F10%2F16%2Fsafety-guy-in-AV%2F</url>
    <content type="text"><![CDATA[backgroundwhy Waymo still can’t release fully driveless cars on road; why vehicle startups, NIO is going to brankruptcy? one of the hidden reasons vehicle as a consumer product can’t go to fast iteration as social media, mobility apps in Internet companies, is safety. vehicle development(VD) needs to satisfy safety requiements at first priority. so in VD, the cutting edge tech/ new solutions usually doesn’t make a difference, but the processes, how to handle safety requirements, system requirements, e.t.c are really the first thing in vehicle engineers’ mind. so now, deep learning AI in object detection is fairly matured, but it’s not used common in vehicle camera products. and for these start ups, who say themselves as tech-driven doesn’t survive well, because VD should be process driven. the best managment/survive way for original equipment manufacture(OEM) companies is build up their process system. e.g. GM managemnet is about process, and Toyota agile management is about a more flexible process, too. no OEMs said their team is tech-driven. what is SOTIF ?safety of the intended functionality(SOTIF): the absence of unreasonable risk due to hazards resulting from functional insufficiencies of the intended functionaliyt or by reasonably foreseeable misuse by persons. SOTIF is a way to define safey by design, while still there are unsafe post design, which is beyond the standards can handle. safety chanllenges in ADSis conserative behaivor of AI unsafe ? YES. as Waymo cars driving in San Fransisco, it’s more conservative than human drivers, which makes itself unsafe and surrounding vehicles unsafe. and the definiation of conservative or aggressive of driving behavior is further area depends, e.g. how human drivers drive in San Fransisco is different than drivers from Michigan, how AI handle this? how does AI satisfy VD requirements, and how to verify the AI really satisfy? there is no standard solution yet. since AI is black box, it may satisfy VD requirements in 99.9% situations, but failed in 0.1% case. an ADS solution works well in 99% scenarios is not an acceptable solution at all, which is really big chanllenge for most self driving full-stack solution start-ups. so Waymo and a few giants are working on test driven verification, which requires build up a very strong simulation team to power up safety test by simulation, which’s even chanllenging for most traditional OEMs. and definitely, there is no way to handle unstructured stochastic traffic environments, classification of safety hazards IS26262 internal cyber security (ISO21434) functional insufficient &amp; human misuse (IS02148) external infrastructura cyber security (traffic light system, jamming sensors signals) malfunction behaviors of EEHarzard Analysis &amp; Risk Assignment | | V ASIL B/C/D &lt;-- sys requires * func redundency * system safety mechanism * predictive safety mechanism this is a close-loop analysis, to satisfy sys requirement will get new situations, which lead to new system requirements system requirements ^ | | V SOTIF Analysis this is a double direction design, with any new requiremenst, there is a need to do SOTIF analysis, which may find out some potential bugs/problems in the design, then back to new system requirements. in summary, the input to SOTIF analysis is system requirements, and output is new system requirements. functional insufficients unsafe by design e.g. full pressure brake design, is not sufficient in scenarios, when there is rear following vehicle in full speed Tech limitations safe known unsafe known safe unknown unsafe unknown SOTIF is to identify all unsafe cases and requies to make them safe, but not know how, but SOTIF can’t predict unknown. back to why Waymo doesn’t release their driveless vehicle yet, cause they don’t have a way to prove their ADS system has zero unsafe unknown situations, and most possiblely the current ADS solution may be reached 80% functions of the final completed fully ADS solution, but that’s even not the turning point for ADS system. where to go? rule based vs AI based as neither purely rule based nor purely AI based can achieve safety goals, so there are combined ways, but then there need a manager module to decide which should be weighted more in a special situation. a new undeterministic/unpredicted system explaination reinforcement learning is actually a good way to train agents in unpredicted simulation environemnt, but even in simulation, it can’t travel every case in the state space; then to use R.L to train vehicles in real traffic environment with random pedestrains, npcs, situations, which’s impossible so far. a new AI system current AI is still correlation analysis, while human does causal reasoning. refersafety first for automated driving handover to PR ISO 26262 standard how to reach complete safety requirement refinement for autonomous vehicle]]></content>
  </entry>
  <entry>
    <title><![CDATA[cruise webriz configure]]></title>
    <url>%2F2019%2F10%2F16%2Fcruise-webriz-configure%2F</url>
    <content type="text"><![CDATA[backgroundduring ADS development, ros bag is used a lot to record the sensor, CAN and scenario info, which helps in vehicle test, data fusion, perception.cruise webviz is an open source tool to visualize rosbag and allow user defined layout in web environment, without the pain to run roscore and other ros nodes. while js is not a common tool in my visibility, the way how an js project is organized is really confused at the first time. there are a punch of new things: react [regl]https://www.giacomodebidda.com/how-to-get-started-with-regl-and-webpack/) and js functions/modules are really patches, they can be pached anywhere in the project, so usually it requires a patch manager(e.g. lerna) to deal with versions, dependencies etc; and bigger there are packages, which is independent function component. webviz has a few packages, e.g. regl-worldview, webviz-core, which has more than 40 patches(modules) used. cruise:worldview cruise:webviz core rosbag.js lernalerna is an open source tool to manage js project with multi packages. lerna init will generate lerna.json, which defines the required modules package.json defines useful infomation about the module depencies, CLI(which give a detail about how things work behind) and project description lerna bootstrap --hoist react will install all required modules in the root folder /node_modules, if doesn’t work well, leading to Can&#39;t Resovle module errors, may need manually install some. webpackwebpack is an open source tool for js module bundler. webpack.config.js, defines entry, where the compiler start; output , where the compiler end; module, how to deal with each module; plugin, additionaly content post compiliation; resovle.alias define alias for modules. 1234567891011121314var path=require("path")module.exports =&#123; entry: &#123; app:["./app/main.js"] &#125; output: &#123;path: path.resolve(__dirname, "build")&#125; &#125;npm install webpack-dev-servernpm list | head -n 1 webpack-dev-server --inline --hot the JS bundle could be loaded from a static HTML page, served with any simple web server. That’s exactly what this /packages/webviz-core/public/index.html file is, which is used for https://webviz.io/try. The webpack dev server is configured to serve it at /webpack.config.js npm commands123456npm config list npm install packages@version.minor [-g] [--save]npm uninstall packages npm search packagesnpm cache clean --force webpack-serverwebpack-server is a little nodejs Express server. to install it as a CLI tool first, then run it under the webviz project root folder, with optional parameters, e.g. –host, –port e.t.c webviz configure npm config set registry https://r.npm.taobao.org npm install puppeteer –unsafe-perm=true npm run bootstrap if failed: sudo npm cache clean –force npm install -g lerna npm run bootstrap sudo npm install/rebuild node-sass npm run build if failed, manually installed unresovled modules npm test if failed based on a few test modules, install then npm install webpack-dev-server –save webpack-dev-server.js –host IP –port 8085 #under project root a little hack, by default npm install inter-ui phli’s inter-ui), where insidewebviz/packages/webvix-core/src/styles/fonts.module.scess, it uses: url(&quot;~inter-ui/Inter UI (web)/Inter-UI.var.woff2&quot;) format(&quot;woff2-variations&quot;), modified this line to: url(&quot;~inter-ui/Inter (web)/Inter.var.woff2&quot;) format(&quot;woff2-variations&quot;) refertabbles webviz in ros community access web-server in LAN issue: how to run cruise webviz Ternaris Marv minio: high performance object storage for AI webviz on remote bag visuaize data from a live server npm package install in China taobao.npm]]></content>
      <tags>
        <tag>ros</tag>
        <tag>cruise</tag>
      </tags>
  </entry>
  <entry>
    <title><![CDATA[reinforcement learning in nutshell-2]]></title>
    <url>%2F2019%2F10%2F13%2Freinforcement-learning-in-nutshell-2%2F</url>
    <content type="text"><![CDATA[function approximationwrite the mapping from \ pair to value as $ S X A -&gt; Q $. usually S, Q can be continuous high-dimensional space, and even in discrete state problem. e.g. in GO game, the state space is about 10^170, to store each \ pair in a table is almost impossible. a good and approximated way is using a hash map, where input a state, and through this hash map to get its value, which is one step, rather than many steps even with a tree strucutre of the whole state-action/value space. in math, this is called function approximation, it’s very intuitive idea, to fit a function from the sample points; then any point in the space, it can be represented by the function’s paramters, rather than search in all the sample points, which may be huge; and further as only the function’s parameters is required to store, rather than the whole sample points information, which is really a pleasure. Deep Q-Learningpreviously in Q-Learning, to update Q(S,A): $ Q(S,A) &lt;- \delta ( Q(S, A) + R + \gamma ( Q(S&apos;, A&apos;) - Q(S, A)) to approximate q(s,a) can use a neural network(NN), the benefit of NN is to approximate any function to any precise, similar to any complete orthonormal sequence, e.g. Fourier Series. CNN approximatorthe input to Q neural network is the eigenvector of state S, assuming the action set is finite. the output is action value in state-action-hyperparameter: q(s, a, \theta), Q(S, A, \theta) ~= Q(S, A) as the right most section, here assuming the action set is finite, so for each action, there is an output. experience replayinstead of discarding experiences after one stochastic gradient descent, the agent remembers past expereicnes and learns from them repeatdely, as if the experience has happened again. this allows for greater data efficiency. another reason, as DNN is easily overfitting current episodes, once DNN is overfitted, it’s difficult to produce various experinces. so exprience replay can store experiences including state transitions, rewards, and actions, which are necessary to perform Q learning. at time t, the agent’s experience e_t is defined as: all of agent’s experiences at each time step over all episodes played by the agent are stored in the replay memory. actually, usually the replay memory/buffer set to some finite size limit, namely only store the recent N experiences. and then choose random samples from the replay buffer to train the network. the key reason here is to break the correction between consecutive samples. if the CNN learned from consecutive samples of experience as they occurred sequentially in the environment, the samples are highly correlated, taking random samples from replay buffer. target networkuse a separate network to estimate the target, this target network has the same architecture as the CNN approximator but with frozen parameters. every T steps(a hyperparameter) the parameters from the Q network are copied to this target network, which leads to more stable training because it keeps the target function fixed(for a while) go futher in DQNDouble DQNhere use two networks, the DQN network is for action selection and the target network is for target Q value generation. the problem comes: how to be sure the best action for the next state is the action with the highest Q-value ? Prioritized Replay DQNDueling DQNyet a summaryat the begining of this series, I was thinking to make a simple summary about DQN, which looks promising in self-driving. After almost one month, there are more and more topics and great blogs jumping out, the simple idea to make a summary is not easy any more. I will stop here, and once back. refermath equation editor Petsc Krylov zhihu: function approximate dqnbook: experience replay Toward Data Science explained replay memory going deeper into RL: understanding DQN improvements in DQN]]></content>
      <tags>
        <tag>reinforcement learning</tag>
      </tags>
  </entry>
  <entry>
    <title><![CDATA[configure pgAdmin4 in server mode]]></title>
    <url>%2F2019%2F10%2F13%2Fconfigure-pgAdmin4-in-server-mode%2F</url>
    <content type="text"><![CDATA[backgroundpgSQL is common used to store hd map in ADS stack, and the client is usually pgAdmin4. in big tech companies, e.g. Tencent, Alibaba, Uber e.t.c, their data infrastructure for ADS stack development is usually based on pgSQL. for our team work, we also store most map info in pgSQL, and for better development toolchain, there is a need to configure the client pgAdmin4 as a web server, rather than request every developer to install a copy of pgAdmin4 at their local side. in LAN web server, apache2 is used to serve a few services, so there is a need to configure multi virutal host in Aapche2, with differnt port and same IP. pgSQLServer official install postgresql start pgSQLserver jump into pgSQL server: 1234/etc/init.d/postgresql start sudo su - postgres pgAdmin4installed by apt-get install pgadmin4, will put pgAdmin4 under local user or root permission, which will be rejected if accessed by remote clients, whose permission is www-data so it’s better to install from src and in a different Python virtual env. setup Python virtual Env install pgAdmin4 from src there maybe a few errors as following: * No package &apos;libffi&apos; found * error: invalid command &apos;bdist_wheel&apos; [sol](https://stackoverflow.com/questions/34819221/why-is-python-setup-py-saying-invalid-command-bdist-wheel-on-travis-ci) * error: command &apos;x86_64-linux-gnu-gcc&apos; failed with exit status 1, [sol](https://stackoverflow.com/questions/21530577/fatal-error-python-h-no-such-file-or-directory) configure pgAdmin4follow this configure pgadmin4 in Server mode configure Apache2follow previous blog to set Apache2 for pgadmin4: &lt;VirtualHost *:8084&gt; ServerName 10.20.181.119:8084 ErrorLog &quot;/var/www/pgadmin4/logs/error.log&quot; WSGIDaemonProcess pgadmin python-home=/home/david/py_venv/pgenv WSGIScriptAlias /pgadmin4 /home/david/py_venv/pgenv/lib/python3.5/site-packages/pgadmin4/pgAdmin4.wsgi &lt;Directory &quot;/home/david/py_venv/pgenv/lib/python3.5/site-packages/pgadmin4/&quot;&gt; WSGIProcessGroup pgadmin WSGIApplicationGroup %{GLOBAL} Require all granted &lt;/Directory&gt; &lt;/VirtualHost&gt; restart Apache2 server will make this works. referwhy sudo - su godaddy about sudo - su configure pgadmin4 in Server mode]]></content>
      <tags>
        <tag>pgAdmin4</tag>
      </tags>
  </entry>
  <entry>
    <title><![CDATA[host mod_wsgi in Apache2]]></title>
    <url>%2F2019%2F10%2F13%2Fhost-mod-wsgi-in-Apache2%2F</url>
    <content type="text"><![CDATA[Apache2 BackgroundApache2 virutal host site-available all virtual hosts are configured in individual files with /etc/apache2/sites-available site-enabled until the *.config in site-available are enabled, Apache2 won’t know know. sudo sevice apache2 reload IP based virtual host use IP address of the connection to determine the correct virtual host to serve, so each host needs a separate IP we had name-based vhost in LAN, a few web servers sharing the same IP in the same physical machine, but with different Ports, and even we don’t use DNS server to tell the domains. apache mode coresrefer VirtualHost the title to define this virutal host, and tell which port to listen. by default is port80 ServerName sets the request scheme, hostname, and port that the server uses to identify itself ServerAlias sets the alternate name for a host WSGIDaemonProcess for wsgi app, which usually define in a seperate Python virtual environment, rather than the default localhost user or root. so WSGIDaemonProcess point to the python virtual env. WSGIScriptAlias point to the wsgi_app.wsgi DocumentRoot set the directory from which httpd/apache2 will serve files /var/www/html mod_wsgi install apt-get install apache2 libapache2-mode-wsgi-py3 config in Apache2config with configure file create example.conf under /etc/apache2/conf-available/ 12345678&lt;VirtualHost *:8084&gt; ServerName 10.20.181.119:8084 ServerAlias example.com DocumentRoot &quot;/var/www/html&quot; ErrorLog &quot;/var/www/example/logs/error.log&quot; &lt;/VirtualHost&gt; the served web content is stored at /var/www/html, which can simple include a index.html or a few js. enable configure, which will create corresponding conf under conf-enable folder sudo a2enconf example check configure sudo apachectl -S sudo apachectl configtest config with virtual host create example.conf under /etc/apache2/site-available/12345678&lt;VirtualHost *:8085&gt; ServerName 10.20.181.119:8085 ServerAlias application.com DocumentRoot &quot;/var/www/wsgy_example&quot; ErrorLog &quot;/var/www/wsgy_example/logs/error.log&quot; WSGIScriptAlias /application /var/www/wsgy_app/application.wsgi&lt;/VirtualHost&gt; here used the additional WSGI script, link 123456789101112131415161718import osimport syssys.path.append(&apos;/var/www/wsgy_app/&apos;)os.environ[&apos;PYTHON_EGG_CACHE&apos;] = &apos;/var/www/wsgy_app/.python-egg&apos;def application(environ, start_response): status = &apos;200 OK&apos; output = b&apos;Helo World&apos; response_headers = [(&apos;Content-type&apos;, &apos;text/plain&apos;), (&apos;Content-length&apos;, str(len(output)))] start_response(status, response_headers) return [output] enable configure, which will create corresponding conf under site-enable folder sudo a2ensite wsgy_example check configure sudo apachectl -S sudo apachectl configtest add multi ports:in /etc/apache2/ports.conf: Listen 8083 Listen 8084 since this Apache server host system_engineering web and pgadmin4 web, and share the same IP. restart apachesudo systemctl restart apache2 test apachein browser: 10.20.181.119:8085/application should view “hello world” refername based virtual host Apache2 virutal host vhost with different ports]]></content>
      <tags>
        <tag>apache2</tag>
      </tags>
  </entry>
  <entry>
    <title><![CDATA[Gitlab integrated Jenkins]]></title>
    <url>%2F2019%2F10%2F09%2FGitlab-integrated-Jenkins%2F</url>
    <content type="text"><![CDATA[backgroundafter built the Gitlab server in LAN for two month, managing about 20 projects in ADS group currently. the needs for CI is coming up. at very beginning, I tried Gitlab CI runner, doesn’t work through. so Jenkins! Jenkins installationthere is a great series Maxfields jekins-docker tutorial, since originally I had Docker env and which is a prepartion for cloud deployment in future. Jenkins in Docker installation, start Jenkins in Docker 1docker run --rm -p 8080:8080 -p 50000:50000 -v jenkins-data:/var/jenkins_home -v /var/run/docker.sock:/var/run/docker.sock jenkinsci/blueocean setup wizard also Jenkins installed directly in Ubuntu Jenkins integrated with GitlabJenkins default has integrated with BitBucket and Github, to integrate with Gitlab: set Gitlab token in Gitlab Project main page, go to User Profile &gt;&gt; Settings &gt;&gt; Peronsal Access Token, naming the token and click api domain, the token is generated. install Gitlab Plugins in Jenkins main page: Manage Jenkins &gt;&gt; Manage Plugins &gt;&gt; Search Available Plugins (gitlab). add gitlab key credentials in Jenkins main page, go to Credentials, choose the key type to Gitlab API token, then generate the key credential, used to access Gitlab project. configure Jenkins with gitlab in Jenkins main page, go to Manage Jenkins &gt;&gt; Configure System, at the gitlab section, add the gitlab host url and use the credential created at previous step. since the gitlab server is hosted in LAN, even don’t have DNS for the gitlab server, purely raw IP address. so the gitlab host URL is like: http://10.20.110.110:80 rather than the project git url (e.g. http://10.20.110.110/your_name/your_project.git) Gitlab Hook Plugingitlab events will be post to Jenkins through webhook, which is a common way to notify external serivces, e.g. dingding office chat, JIRA e.t.c. To set it up, need to configure Gitlab Hook plugin in Jenkins. as mentioned in this post, JDK10 or 11 is not supported for Jenkins, if the current OS system has already JDK11, need addtionally install jdk8, and configure the default jdk=8: 123456update-java-alternatives --list sudo update-alternatives --config java java -version Jenkins Projectadd a new Jenkins Item, select FreeStyle, go to Configure. in Source Code Management section, select Git. add Repository URL and set gitlab username and password as Jenkins Credentials. in Build Triggers section, select Build when a change is pushed to Gitlab, which display the Gitlab webhook URL: http://localhost:8080/jenkins/project/demo; go to Adavance section to generate the secret token. when getting all these done, back to Gitlab project, as Admin or Maintainer, in Project Settings &gt;&gt; Integrations &gt;&gt; Add Webhooks.URL and Secret Token are from the previous settings. there is a common issue: Url is blocked: Requests to localhost are not allowed, please refer to allow request to localhost network from system hooks referriot games: putting jenkins in docker jenkins to gitlab authentication devops expert: Emil mind the product: PM’s guide to CD &amp; DevOpsmanage multi version of JDK on Ubuntu a jianshu refer a aliyun refer tencent refer allow request to localhost network from system hooks]]></content>
  </entry>
  <entry>
    <title><![CDATA[warm up Hilber space]]></title>
    <url>%2F2019%2F09%2F29%2Fwarm-up-Hilber-space%2F</url>
    <content type="text"><![CDATA[abstract space Cauchy Series in metric space, existing an any positive and small \etta, exist a natural number N, $m, n &gt; N$, which satisfies: it’s called a Cauchy Series. intuitively, as the numbers goes above, the elements get closer. Complete Space any Caunchy Series in these abstract space, its convergence is still in the original space. e.g. rational number space is not complete space, as is not in rational number space anymore. intuitively, a complete space is like a shell without any holes on its surface. Linear Space with linear structural set, which can be described by base vector, so any hyper-point in a linear space can be represented by the linear combination of its base, so also called as vector space. in another way, linear space has only add and scalar multi operator. Metric Space to describe length or distance in linear space, adding normal in the linear space, gives normed linear space or metric space. Banach Space a complete metric space Inner Product Space with inner product feature in metric space. for the infinite space, there are two different sub space, either the inner product of the series converged or not. the convergence sub space is a completed space. Hilbert Space a completed inner product space Eurepean Space a finite Hilbert Space. functional analysisthe optimzed control theory problem, is to find the functional, under system dynamic constraints, which gives the extremum point(function) in the infinite normed linear space; in general. in general, the functional is depends on the system dynamic path, which gives some famuous theorem: Banach fixed point. PDE theory, partial geometry, optimized control theory are all part of functional analysis. from computational mechanics field, there is a chance go to PDE theory; from modern control field, there is a chance go to optimized control theory; from modern physics field, the students may also get familiar with partial geometry. the beauty of math is show up, all these big concepts finally arrive to one source: functional analysis. to transfer all these big things into numerial world, which gives CAE solver, DQN solvers e.t.c. the closest point property of Hilbert SpaceA subset of A of a vector space is convex: if for all a, b belongs to A, all \lamba such that 0 &lt; \lamba &lt; 1, the point belongs to A assuming A is non-empty closed convex set in Hilbert Space H, for any x belongs to H, there is a unique point y of A which is closer to x than any ohter point of A: orthogonal expansionsif (e_n) is an orthonormal sequence in a Hilbert space H, for any x belongs to H, (x, e_n) is the nth Fourier coefficient of x with respect to (e_n), the Fourier series of x with respect to the sequence (e_n) is the series: pointwise converges where f_nis a sequence of functions in complete orthonormal sequence given an orthonormal sequence of (e_n) and a vector x belongs to H then: but only when this sequence is complete, the right-hand side converge to left-hand side workks. an orthonormal sequence (e_n) in Hilbert Space H is complete if the only member of H which is orthogonal to every e_n is the zero vector. a complete orthonal sequence in H is also called an orthonormal basis of H. H is separable if it contains a complete orthonormal sequence, the orthogonal complement (which named as E^T) of a subset E of H is the set: for any set E in H, E^T is a closed linear subspace of H. the knowledge above gives the convergence in Hilbert Space, application like FEA, once created the orthonormal basis of solution space, the solution function represented by the linear combination of these basis is guranted to convergence. Fourier seriesit is a complete orthonormal sequence in L^2(-\pi, \pi), and any function represented by a linear combination of Fourier basis is converged. but here Fourier basis is the arithmetic means of the nth partial sum of the Fourier series of f, not the direct basis itself. Dual spaceFEA solutions for PDEPDE can be represented in a differential representation: or an integral representation: integral formulation has included existing boundary conditions. by multiplying test function \phi on both side of integral representation, and using Green’s first identity will give the weak formulation(variational formulation) of this PDE. in FEA, assuming the test function \phi and solution T belong to same Hilbert space, the advantage of Hilbert space, is functions inside can do linear combination, like vectors in a vector space. FEA also provides the error estimates, or bounds for the error. the weak formulation should be obtained by all test functions in Hilbert space. this is weak formulation due to now it doesn’t exactly requries all points in the domain need meet the differential representation of the PDE, but in a integral sense. so even a discontinuity of first derivative of solution T still works by weak formulation. Hilbert space and weak convergenceconvergence in another words means, the existence of solution. for CAE, DQN algorithms, the solution space is in Hilbert space. Lagrange and its dualityto transfer a general convex optimization problem with lagrange mulitplier: =&gt; we always looks for meaning solution, so L should exist maximum extremum. if not, as x grow, the system goes divergence, no meaningful solution. define: so the constrainted system equation equals to : to rewrite the original equation as: then the system’s duality is defined as : for a system, if the original state equation and its duality equation both exist extremum solution, then : the benefits of duality problems is the duality problem is always convex, even when the original problem is non-convex the duality solution give an lower boundary for the original solution when strong duality satisfy, the duality solution is the original solution Krylov Spacereferconvex func finite element method Lagranget duality standford convex optimization Going deeper into RF: understaing Q-learning and linear function approximation Nicholas Young, an introduction to Hilbert Space why are Hilbert Space important to finite element analysis]]></content>
  </entry>
  <entry>
    <title><![CDATA[reinforcement learning in nutshell-1]]></title>
    <url>%2F2019%2F09%2F18%2Freinforcement-learning-in-nutshell-1%2F</url>
    <content type="text"><![CDATA[RL conceptsassuming timestep t: the environment state S(t) agent’s action A(t) discouted future reward R(t), which satisfy: with current state `s` and taking current action `a`, the env will give the reward `r_{t+1}`. `\gamma` is the discount ratio, which usually in [0,1], when $\gamma = 0 $, agent&apos;s value only consider current reward. check [discount future reward]() in following chapter. agent’s policy P(a|s), in current state s, the propability of action a agent’s state value V(s), in state s and took policy \pi, which usually described as an expectation: agent’s action value Q(s, a), which consider both state s and action a effects on value calculation. namely, agent’s value is the expectation of its action value with probability distribution p(a|s). state transfer propability P explore rate \eta, it’s the chance to choose non-max value during iteration, similar as mutation. Markov decision processassuming state transfer propability, agent&#39;s policy, and agent&#39;s value follow Markov assumption. namely, the current state transfer propability, agent’s policy and value only relates to current state. agent’s value function V(s) meets Belman’s equation: and agent’s action value function q(s) also has Belman’s equation: in general, env’s state is random; the policy to take action is policy P(a|s). Markov Decision process is: state, action, policy, each [state, action, policy] is an episode, which gives the state-action-reward series: s0, a0, r1, s1, a1, r2, ... s(n-1), a(n-1), rn, sn Markov assumption is state(n+1) is only depends on state(n). Belman’s equationBelman’s equation gives the recurrence relation in two timesteps. the current state value is calcualted from next future status with given current env status. discounted future rewardto achieve better reward in long term, always conside the reward as sum of current and future rewards: R(t) = r(t) + r(t+1) + ... r(n) on the other side, as the env state is random in time, the same action doesn’t give the same reward usually, and as time goes, the difference is even larger. think about the B-tree, as it goes deeper, the number of nodes is larger. so the reward in future doesn’t count the same weight as earlier reward, here define the discount ratio \gamma &lt;- [0, 1] R(t) = r(t) + \gamma r(t+1) + ... + \gamma^(n-t) r(n) R(t) = r(t) = \gamma R(t+1) policy iterationthere are two steps: policy evaluation, with current policy \pi to evaluate state’s value V&#39; policy improvment, with the state value V&#39;, with a special policy update strategy(e.g. greedy) to update policy. the ideal result is to find the fixed point in value space, which corresponds to the optimized policy at current state with current policy update strategy. in policy iteration, we only consider policy-value mapping. value iterationwhere policy iteration based valued is implicit, only value iteration explicit by Belman’s equation. this is also a fixed point application, as we can use the explicit mapping (Belman’s equation) in state value space. so there should guarantee existing an optimized policy. an optimization problemreiforcement learning solution is to find the optimal value function to achieve the max reward in each timestep, which leads to optimal policy \pi*, or max value func, or max action value func. $$ v(s) = max(v_i(s)) foreach i in value funcs $$ $$ q(s, a) = max(q_i(s,a)) foreach i in action value funcs $$ mostly the solution space is not known at hand, if else, the optimal solution can get directly. on the other hand, current state, action set tells a little information about the solution space, as they are part of the solution space, so the optimization algorithms used in convex space can be applied here. a few common optimization algorithms includes: dynamic programming(DP)DP can used in both policy evaluation and policy iteration, but in each calculation step(not even one physical step, and in one physical step, there can be hundreds or thousands of calculation steps) iteration, DP has to recalculate all state(t=current) value to the very first state(t=0) value, which is disaster for high-dimensional problem, and DP by default is kind of integer programming, not fitted to continuous domain either. Monte Carlo (MC)MC gets a few sample solutions in the solution space, and use these samples solution to approxiamte the solution space. in a geometry explaination, the base vectors of the solution space is unknown, but MC finds a few points in this space, then approximate the (nearly) base vectors, then any point in this space can be approximated by the linear combination of the nearly base vectors. MC has no ideas of the propability of state transfer, MC doesn’t care the inner relations of state variables, or model-free. MC is robost as it’s model free, on the other hand, if the solution space is high-dimensional, there is a high chance the sampling points are limited in a lower dimension, which weak the presentation of MC approximation; also MC needs to reach the final status, which may be not avaiable in some applications. time-serial Time Difference(TD)MC use the whole list of status for each policy evaluation; in 1st order TD, policy evaluation only use next reward and next state value: $ v(s) = R(t+1) + \gamma S(t+1)|S $ for n-th order TD, the update will use n-th reward : $ v(s) = R(t+1) + \gamma R(t+2) + ... + \gamma^(n-1) R(t+n) + \gamma^n S(t+n)|S $ it’s clear here, as n-&gt;infinitly, n-th order TD is closer to MC. so how close is enough usually, namely which n is enough ? SARSAin TD, there are two ways: on-line policy, where the same one policy is using to both update value func and upate action; while off-line policy, where one policy is used to update value func, the other policy is used to update action. SARSA is kind of on-line policy, and the policy is e-greedy, to choose the max value corresponding action in every iteration with a high probablitiy (1-e), as e is very small. $ \delta(t) v(S, A) = R + \gamma (v(S&apos;, A&apos;) - v(S, A)) $ in the iteration equation above, \delta(t) will give the iteration timestep size, S&#39;, A&#39; are next state and next action, compare to pure 1-st order TD, where the next state value is modified as: $(Q(S’, A’) - Q(S, A)$, in this way value func keep updated in every func, which can be considered as local correction, compared to the unchanged pure 1-st order TD, which may be highly unstable/diverge in long time series. as time-serial TD can’t guarantee converge, so this local correction makes SARSA numerical robost. SARSA(\lambda) in mutli-step based is same as n-th order TD. $ v(s) = R(t+1) + \gamma R(t+2) + ... + \gamma^(n-1) R(t+n) + \gamma^n ( v(S`) - v(S) ) $ the problem of SARSA is v(S, A) may go huge, as the algorithm is on-line policy, which requires huge memory. Q-learningQ-learning is off-line policy TD. the policy iteration is use e-greedy policy, same as SARSA; while the policy evaluation to update value func use greedy policy. in a word: from status S, using e-greedy policy to choose action A, and get reward R, and in status S&#39;, and using greed policy to get action A&#39;. $ \delta(t) Q(S, A) = R + \gamma (Q(S&apos;, A&apos;) - Q(S, A)) $ (a) where $ A= max v(a|S) $. while in SARSA, both S&#39; and A&#39; update using e-greed. usually Q(S,A) is called Q-valued. the benefit of choosing a different policy to update action, is kind of decouping status and action, so in this way, they can reach more area in real state-action space, which also lead the solution space a little more robost. but still equation (a) is not guaranted to converge as time goes. the converged Q(S,A) should be convex, which means its second-order derivative must be less than 0, then the max Q values (max extremum) achieves when first-order derivate is 0. while Q-learning has the same problem as SARSA, the huge memory to store Q(S,A) table. to make intuitive example, think about an robot walk with 2 choice(e.g. turn left, turn right), and the grid world has 20 box in line, which has 2^20 ~= 10e6 elements in Q(S, A). it can’t scale to any real problem. referSutton &amp; Barto, Reinforcement learning: an introduction Pinard fromeast mathxml editor equation html editor fixed point mechanism]]></content>
      <tags>
        <tag>reinforcement learning</tag>
      </tags>
  </entry>
  <entry>
    <title><![CDATA[python modules in help]]></title>
    <url>%2F2019%2F09%2F18%2Fpython-modules-in-help%2F</url>
    <content type="text"><![CDATA[unittestwhy unit test ?unit test is to test the components of a program automatically. a few good reasons of unit tests: test driving practice early sanity checking for regression test, that the new changes won’t break early work unit test is to test the isolated piece of code, usually white box test written by developer; functional test is to test the function requirements, usually black box test written by testers. the testcase output can be OK, FAIL, ERROR. assertion functions basic boolean asserts: assertEqual(arg1, arg2, msg=None) assertIsNot(arg1, arg2, msg=None) comparative asserts: assertAlmostEqual(first, second, places=7, msg=None, delta=None) asserts for collections: assertListEqual(list1, list2, msg=None) assertTupleEqual(tuple1, tuple2, msg=None) assertSetEqual(set1, set2, msg=None) assertDictEqual(dic1, dic2, msg=None) command line interface run all unittests python3 -m unittest discover -v -c run single test module python3 -m unittest -v -c tests/test_XXXX.py run individual test case python3 -m unittest -v tests.test_XXX.TestCaseXXX.test_XXX how to write a unittestunittest.TestCaseany user defined test class should first derived from unittest.TestCase setUp() &amp; tearDown()setUp() provides the way to set up things before running any method starting with test_xx() in user defined test class. tearDown() is where to end the setting ups depends on other modules/classusually the class/methods we try to test have depends on other modules/class, but as unit test want to isolate these depends, so either mock the other methods or fake data. in general there are three types of depends: pure variablethis is the most simple case, then you can directly define the same pure type variable in your test_xx() method methods in other moduleshere need to use patch, to mock the method and define its return value is your test_xx() method, then in the class where call the real method will be replace by this mocked one. methods in other classhere need to use mock, either MagicMock or Mock will works. mock &amp; patchmock module is built in unittest.mock after a later Python version, if not, mock module can be installed by pip install mock, then directly import mock in code. the philosophy of unit test is to isolate classes test, but in reality most classes have some instances of other classes, so to test the ego class isolated from other classes, that’s where mock helps. 123456789101112131415from mock import MagicMock, patchclass TestScenario(TestCase): def test_mock(self): sim.method = MagicMock(return_value="xx") agent = sim.method() self.assertEqual(agent, "xx") @patch("module/add_agent") def test_patch(self, mock_add_agent): mock_add_agent.return_value = "xx" agent = module.add_agent() self.assertEqual(agent, "xx") Mock helps to mock the method in a class, while patch helps to mock a global method in a module. automatical unittest generatorthere are projects assit to generate unittest code automatically, e.g auger test with while LoopTODO test with raise ErrorTODO coverage.py (one time only) install coverage.py pip3 install –user coverage run all tests with coverage ~/.local/bin/coverage run -m unittest discover generate html report ~/.local/bin/coverage html –omit “~/.local/“,”tests/“ ps, output is in htmlcov/index.html logginglogging() supports more detail message type(info, warn, error, debug) than print(), and the message format is more strict with \%. while print() works both formatted message and message as string simply: 12print("1+1 = ", num)print("1+1 = %d", % num) the following is a general log class, which can plug in any existing Python project: import logging import os class log(object): def __init__(self, logger_name): self.log = logging.getLogger(logger_name) self.log.setLevel(logging.DEBUG) console_handler = logging.StreamHandler() console_handler.setLevel(logging.DEBUG) self.log.addHandler(console_handler) def set_output_file(self, filename): file_handler = logging.FileHandler(filename) file_handler.setLevel(logging.INFO) formatter = logging.Formatter('%(asctime)s - %(name)s - %(levelname)s - %(message)s') file_handler.setFormatter(formatter) self.log.addHandler(file_handler) def main(): test = log('scenario-mm') file_name = "record.log" dir_name = "/home/python_test/" try: os.makedirs(dir_name) except OSError: pass file_path = os.path.join(dir_name, file_name) test.set_output_file(file_path) test.log.debug("debug in episode 1...") test.log.info("info ...") test.log.warning("warn ...") test2 = log("zjjj") test2.set_output_file('record.log') test2.log.info("test2 info") test2.log.warning("test2 warn") if __name__ == "__main__": main() pygamereferpython automatic test series auger: automated Unittest generation for Python unittest official unittest in lg-sim project coverage.py official unittest + mock + tox]]></content>
      <tags>
        <tag>python</tag>
      </tags>
  </entry>
  <entry>
    <title><![CDATA[math in ADS system]]></title>
    <url>%2F2019%2F09%2F14%2Fmath-in-ADS-system%2F</url>
    <content type="text"><![CDATA[when dealing with plan and control models in ADS system, there are plenty of basic math, and the understanding of these math are definitely helpful to build a better and stronger algorithm. to design a ADS software system, there are two parts, the engineering background and software development. enginneering side is the core logic, while software is about how to organize the software and present, especially with new ideas, e.g. cloud based architecture, web user interface. vector analysiscoordinate systemasssumed a Cartesian system [O: e1, e2, e3], a vector r can be represented as (x, y, z). vector operator vector add a + b = (a_x + b_x) i + (a_y + b_y) j + (a_z + b_z) k the triangle inequality is: |a + b| &lt;= |a| + |b| multiply scalar a . b = a . b_x i + a . b_y j + a . b_z k namely, by multiplying each direction component of the vector by a. dot product a . b = |a|.|b|. \cos = a_x . b_x + a_y . b_y + a_z . b_z a simple physical explaination: assume a is displacement, b is force, then the power of force in the displacement is by dot product. the geometric meaning of dot product is projection. namely, |a|cos means project of vector a on vector b, which also gives the decomposion of a, which has a unique projection component vector and vertical component vector. a = a_p + a_v a_p = ( a . e_p ) . e_p cross product assume three vectors non-co-plane, (a, b, c), rotating start from a to b, the thumb point to c, which is a right-hand-coordinate system. a simple physical meaning: assume a is the force, b is the force arm, then a x b gives the torque, which perpendicular to plane. a x b = |a||b| \sin the geometric meaning of cross product is the area of parallelogram by . base vectors in right-hand Cartesian coordiante system satisfy: ei x ej = ek (i != j) mixed product $$ a x b \cdot c $$ which gives the volume of parallel hexagonal propped by curve(volume) integration Gauss integration Stokes integration scalar field directional derivative assuming vector l in space, its unitfied vector can be present as directional cosine [cos\alpha, cos\beta, cos\gamma]^T. for any function u=u(x, y, z), who is derivable at M0(x0, y0, z0), then: \frac{\partial u}{\partial l} = dot_product(\Delta u, &lt;\cos \alpha, \cos \beta, \cos \gamme&gt; ) gradient \Delta u = \frac{\partial u}{\parial x} \b{i} + \frac{\partial u}{\parial y} \b{j} + \frac{\partial u}{\parial z} \b{k} the directional derivative which gives the maximum of \Delta u at a point, it’s the gradient direction. the gradient direction in space, stands for the direction from lowest co-value layer to highest co-value layer, which in physical, means the most high rate of change in general. Halmilton operator \Delta = \frac{\partial}{\parial x} \b{i} + \frac{\partial}{\parial y} \b{j} + \frac{\partial}{\parial z} \b{k} vector field directional derivative the gradient field of a vector field gives a tensor field, which raise the dimension one more. the dot product(inner product) of a vector field, decrease to scalar field, the cross product of a vector field keeps a vector field. for a vector field, usually talk about its flux and divergence. analytical geometryplane equationassume plane \pi in Cartesian coordinate system [O: X, Y, Z], O‘s projection in plane is N, the directional cosine of ON is (l, m, n), for any point P in plane, NP is always perpendicular to the directional cosine, namely: dot_product( NP, (l, m, n) ) = 0 as l**2 + m**2 + n**2 == 1, which gives: lx + my + nz - p = 0 (1) equation 1) is the normalized plane equation. and (l, m, n) is the normal vector of lane \pi, and p should be no lesss than 0. linear equationassumed a line go across point P_0 $(x_0, y_0, z_0)$ and in direction \lamba, then any point P $(x,y,z)$ on this line, satisfy: x - x_0 = |PP_0| . l y - y_0 = |PP_0| . m z - z_0 = |PP_0| . n taking |PP_0| as t, the equation above is the parmeterized line equation, namely: \frac{x-x0}{l} = \frac{y-y0}{m} = \frac{z-z0}{n} in general, a linear is the cross interface of two linear plane, so its genearl equation is to satisfy both plane equation: A1.x + B1.y + C1.z + P1 = 0 A2.x + B2.y + C2.z + P2 = 0 coordiante transferin general, coordiante transfer means base vector linear transfer. assuming original coordinate system [O: e1, e2, e3], and new coordinate system [O’, e1’, e2’, e3’] both the original base vector and new base vector &lt;e1’, e2’, e3’&gt; prop up the same 3D linear space. and there should be transfer matrix between them: e = matrix[3x3] \cdot e’ for any point P transfer from original to new coordinate system, {x} = {a} + {x’} . matrix[3x3] matrix[3x3] is: a11 a21 a31 a12 a22 a32 a13 a23 a33 each raw in the matix above, stands for a base vector tranfer from original one to the new one, namely: $$ e1&apos; = a11.e1 + a21.e2 + a31.e3 $$ the component value of each coordinate, can be simply by: x1’ = e1’ . OP’ referwrite mathematic fomulars in Markdown]]></content>
  </entry>
  <entry>
    <title><![CDATA[未来5年在哪里 10]]></title>
    <url>%2F2019%2F09%2F14%2F%E6%9C%AA%E6%9D%A55%E5%B9%B4%E5%9C%A8%E5%93%AA%E9%87%8C-10%2F</url>
    <content type="text"><![CDATA[个人与公司的生存和发展法则。因为企业生存状态以及老板的眼光，国内企业会有一种氛围：老板花钱是让来工作的，来学习都不该提倡。个人而言，学习才能保持议价权。也需要有个良好的心态，面对这样的企业现状。 国内就业大环境也长期不看好。本来优质的大公司和岗位不多，相对缺乏壁垒的，最近的一个新闻： 一汽大众不招传统专业，比如，车辆、机械。不过话说回来，一汽大众的工作在长春也是绝对高薪。不招人背后，是技术岗位都在德国，那国内的年轻人在这样的企业怎么积累？ 当然，即使在这样的企业内，还会分三六九等。年轻人被局限的穷迫程度可想而知，这样缺乏格局的环境，也没办法培养年轻人的领导力。 上一篇 提到北欧国家。国家大环境，确实对年轻人的选择有太大影响。国内年轻人如果能选择行业，确实应该避开传统制造业、农业等低增长周期的企业。当然，互联网、金融、地产、消费服务等快增长周期的行业，算适应国家的大环境，人也不轻松。 国与家选择一个国家，就是选择了一套制度。在中秋夜，每有悠闲的时候，心里反而更沉，似乎只有加班工作，可以抵消内心深处的不安。 回来看了很多关于中国发展的报道《中国制造2025》以及生活现状：住在筒子楼里的爱情，缺乏社会秩序和行业规则的生活和工作环境，慢慢溢出。 回国前，对国内的生活的1万种可能，都不包括现实的样子。 小镇青年的生活，完全是另一个折叠。我以为的真实生活到底只是我愿意认知的真实吧了，包括对国与家的态度。有时候觉得，是不是自己跟周围的人太格格不入了，可惜看到《计算机应用》学报上的文章，真的很无奈，大部分年轻人在这样的状态下追求的是什么价值？ 商业的逻辑小城唯一的一家沃尔玛，进去看到商品的丰富，真的是感动了一下。想当年在美国，出入walmart, whole food, wegments, kroger, 虽然也感谢生活的便利舒适和优雅，不过没有这么渴望。 在国内生活，思路也发生了转变。进沃尔玛看到陈列的各式衣服、首饰，第一反应，竟然是如何成为沃尔玛的供货商。这个平台大到足够小富即安，年入百万。这样的思路，放在美国生活的时候，估计有点丧心病狂。 也可能国内的环境慢慢教人认识到：商人和企业家是不同的吧。 记得之前评价恒大造车，说他是对行业的玷污，那不过是把恒大想成了汽车行业的企业家，需要对这一行有超出商业范畴的责任。不过人家本质上是一名商人。 读《中国制造2025》另外一个感受，就是制造业升华的制约，已经不在制造业本身，而是体制。企业家、商人做了他们该做的，甚至在中国这样行业秩序、商业秩序、社会秩序相对匮乏的条件下，他们已经做了足够多了。 记得有位中国vp评价说，相比中国的投资人，哪有外国的投资团队的活路。大概就是想表达，中国各行各业的生存之多艰，相对好的制度下的培育的行业团队，中国行业是颇有杀伤力的。 当然，这样的杀伤力更是“劣币驱逐良币”。长久看是恶化商业环境，既不是鼓励了资源优化、也不是鼓励创新、更不是鼓励更人性的生存环境。所以这样的“强势”，真叫人哭笑不得。也大约是全世界都逐渐清醒地抵制中国发展模式的根因。 商人与企业家的不同，更突出的体现在mindset不同。于我，可能默认自己对行业还有一些初心，不是唯利是图的。所以，愿意了解的都是行业内技术、管理实践等等。当我在拼命地理解和转化自动驾驶新技术和趋势，对国外成熟管理模式的移植，当然也很力不从心。开淘宝店，开咖啡馆的朋友，根本没有这样的mindset。服务行业不是没有科学管理和技巧可言，只是偏技术迭代的mindset偏离了服务的本质。 然鹅，在基础科学研究和应用科学转化上的各种弊病，以及反应在知名企业可供选择的求职口上的差距，实际上让国内想成长为企业家的年轻人，或安心当工程师的年轻人，是非常吃亏的。即便比美国年轻人更努力，也不会比他们生活的更舒适和自信。在这样的实体环境下，谈技术积累，应用转化，企业家精神，就像明知无望，还让人上的理想主义。可是现在的年轻人，才不会为了“爱国“二字义无反顾，国家欠着我等年轻人多了，收割我等韭菜，还要我等为着看不到的未来埋单。 相比实业，在国内，服务业的发展，至少不会明显存在差距。因为人总是要吃喝拉撒被服务的，虽然服务质量、体验、管理等，不如发达经济体，但是服务业没准备国际化：开饭店的、洗衣房的、健身房的等等，没准备打入国际市场，总之还可以混个体面生活。 国家抱怨国内的小老板，小企业家，小富即安，不思进取，不能担当“百年老店”。说实话，心里都明白：有制度保障，企业家大富吗，保障企业百年长青吗？与其显山露水，被制度卡脖子，大家都不愿做被拍死的出头鸟。也大概是万一做大了，就赶紧换国籍的原因。 我不再抱怨企业家、商人和个体，什么样的社会体制孕育企业和个人。说国内企业无奸不商，老百姓缺乏仁心仁德，只是国家的问题。在对国内实业无望之后，大量实业人才流入金融证券等服务业，大量企业家沦为商人，投机买地，也是无奈之举。不知道还有没有明天，今天就做一回人吧。谁不爱惜一次的生命呢。 两个人都挣50万，在学校承包食堂的经理a和在大公司上班的工程师b，显然是不同的。 国内混久了，初心早就喂狗了。活着，才是中国人的最高哲学。 ps, 不吐槽，不快乐。you just can’t change your role in life, even you don’t like it sometimes, so enjoy it happy or unhappy. 个人技能进阶带团队，会有一个时间段特别兴奋，感觉每天都有很多进步；也有一个时间段，非常阻塞，看不清下一步。又在一个阻塞期，想到了底层实力。 底层实力做技术/工程应用，前期缺乏很高明的应用框架设计，大部分时候就是解决具体的问题。手段如何都不重要。大概是测试驱动的开发，不断调试，试错，直到找到一个通畅的解决。下次再碰到新问题，继续试错调整。这样的工作模式，也许不算agile。 目前阶段开发的几个层次： 熟悉项目，写简单的逻辑 使用成熟的库/框架（网络，前端，分布式） 掌握新编程思想(协程、异步、容器、微服务） 与此同时，对开发工具，生产环境(git, Docker)，应用场景的拓展逐渐成型。 产品导向缺乏底层实力，会容易被卡住。或者说缺乏产品导向，走着走着，不知道下一步在哪儿了。产品导向，需要一个人多任务，多人协调能力。 初创公司生存技术突破做ai等技术场景的初创公司，花大量的资源和人力，研究和实现新算法，提高算法的识别率，精度等。 产品入口相反，传统行业的创业，并没有领先技术和雄厚资本，总是从最不起眼的组装产品开始的。 资本壁垒一些规模效应的创业团队，投资人大量投入资本拼杀。]]></content>
  </entry>
  <entry>
    <title><![CDATA[npc wp planner in ADS simulation]]></title>
    <url>%2F2019%2F09%2F11%2Fnpc-wp-planner-in-ADS-simulation%2F</url>
    <content type="text"><![CDATA[backgrounda ADS simulation platform should have the three components: vehicle dynamics model for game engine(e.g. unReal, Unity3D) based ADS simulation platform, there has simple game-level vehicel dynamics(VD), and in engineering side, there are more precise VD, such as Carsim, CarMaker, and a few smaller ones. sensor models currently camera and Lidar model are pretty good, for mm Radar, which has a frequency around 24G hz or 77G hz, which is far more beyond the digital computer can sample, so to simulate Radar(or Radar model) is not that practicalable. as can find in famous simulator, e.g. lg-sim and Carla, they are puting a lot efforts in sensor models, which should be more useful in future. another good way of sensor model simulation, is to test and verify the vendor’s sensors. most time, the performance test from vendors are not fit well. and as the physical model of the sensors is not visible, to simualate as a black box is useful then. scenario describe PEGSUS project used OpenScenario xml language; in other case, the scenario can be described directly in Python. basically what need to define during scenario descibtion, includes: scenario static env scenario npc actors movement in more advanced solutions, the process of builing virtual envs directly start from dronze/cars images or cloud point images. what’s in this post is one part of npc movement control: lane change. lane_change_done eventat default lg-sim, there is a lane-change event, which describes when lane change happens; for a more precise npc control, here adds a lane-chagne-done event, basically to describe when the lane change is done. similar to lane-change event, here defines the lane-change-done event in both server and client side. Unity.transform.position vs local positionhere is the math part. Unity has plenty APIs to handle these, which is good to study in. dot product Vector3.Dot() cross product p2p distance Vector3.Distance() interpolate Mathf.Lerp() world 2 local transfer Transform.InverseFromPoint(position) camera matrix equations of lines waypoints described laneNPC agents are drived by waypoints embedded in the virtual roads, in lg-sim, waypoints is inside LaneSegmentBuilder. but in general the waypoints are not ideal: their distance is not unique their index in neighboring lanes doesn’t keep consistent they don’t care of curves of real roads for waypoints based planner, rather AI-based planner, e.g. lane-change planner, the processes are as following: get currentTarget in target(current) lane which is usually not pointed by the same currentIndex as the previous lane, so need to figure out the closeset wp from the waypoint list in current lane, and this closest wp should be in front of NPC (no consider NPC retrograde). keep currentTarget update during NPC (lane-change) operation, there is case when the currentTarget is behind NPC position, if it’s not expected, it’s always a error planner, leading NPC header turn over. so need to check currentTarget is always in front of NPC, if not, update currentTarget to next wp. 1) driving direction vs currentTarget2NPCposition driving direction in global dot product currentTarget2NPCposition should greater than zero, if not, update currentTarget 2) NPC in currentTarget local position NPC local position should always in negative axis, if not, update currentTarget. the trick here, can’t use currentTarget in nPC local position, as when NPC is head of currentTarget, NPC will be pointed in reverse, which makes the local coordinate reverse as well. but currentTarget is the fixed wp always. lane-change-done criteria ideally when the whole NPC occupied in the target lane, that’s when the lane-change operation done. but that’s never the reality. as then, if there is the next next lane, the NPC is partly in the next next lane, so can’t keep lane change in only one neighboring lane. but in reality, highway or express way, the vehicle can’t across two lane in one time. so to keep lane-change-done and done in just the next lane, the criteria is when the NPC position in Z or X direction projection to currentTarget X or Z direction projection is more than half of the lane width: Mathf.Abs(frontCenter.position.z - currentTarget.z) &lt; laneWidth/2.0 usually in game engine, NPC can be controlled by AI, will discuss later.]]></content>
      <tags>
        <tag>lgsvl</tag>
      </tags>
  </entry>
  <entry>
    <title><![CDATA[a nested conditonal variable model in python]]></title>
    <url>%2F2019%2F09%2F06%2Fa-nested-conditonal-variable-model-in-python%2F</url>
    <content type="text"><![CDATA[background the work is derived from lg-sim/lgsvl/remote, the original remote function is listen message return from server on each command request, basically a async request-receive model. additionaly, we want lg-sim server to send every episode state info to client too, luckily, the server - client communicate is based on Websocket，which support server actively pushing message. so simple add a episode state info in server side and send it through websocket at every frame works. in conditional variable design, notify() won’t be hanged-up, while wait_for() can be hanged-up if no other thread call notify() at first. in Python class, to keep object status, it’s better to use class member variable, rather than object member variable, which can’t track its status when the object reset. (/) try finally the status update in try() package won’t be the final status . 12345678910111213141516 def process(type): status = False try: if(type=="a"): status = True finally: if(type == "b"): status = False print("cv.status ....\t", status) return statusdef main(): type_list = ["a", "a", "b", "b", "a", "a", "a"]; for _ in range(len(type_list)): print(process(type_list[_])) the client designin client, the message received is handeled in process(). by default, there is only one type of message, namely the event message, so only one conditional variable(CV) is needed to send notification to command() to actually deal with the recieved message. first we defined the new message type(episode message), as well as a new real-handel fun episode_tick(). then modifying remote::process() as: 123456789101112131415try: self.event_cv.acquire() self.data = json.loads(data) if check_message(self.data) == "episode_message" ： self.event_cv.release() self.event_cv_released_already = True with self.episode_cv: self.episode_cv.notify() else : self.event_cv.notify()finally: if self.event_cv_released_already: self.cv_released_already = False else: self.cv.release() will it dead-locked ?as both command() and episode_tick() have conditional variable wait_for() : 1234567public command(): with self.event_cv : self.event_cv.wait_for(lambda: self.data is not None)public episode_tick(): with self.episode_cv.wait_for(lambda: self.data is not None) so if notify() is not called in any situation, dead-locked happens, meaning the wait_for() will never return but suspended. remote.process() is running in another thread, rather than hte main sim thread, where run sim.process(). remote.process() used to accept the message, and sim.process() is the actual place to handle these received messages. the two threads run simutaneously. for any message received, if its type is event_message, then it triggers event_cv.notify(), so command() in sim::process() won’t dead block; to avoid episode_message dead locked, in sim::process() need to call episode_tick() only when the message type is episode_message, which can be checked by remote.event_cv_released_already == True, 123456789101112def sim::process(events): j = self.remote.command(events) while True: if self.remote.event_cv_released_already : self.remote.episode_tick() if j is None: return if "events" in j: self._process_events(j) j = self.remote.command（“continue")]]></content>
      <tags>
        <tag>python</tag>
        <tag>multithread</tag>
      </tags>
  </entry>
  <entry>
    <title><![CDATA[vehicle dynamics model in ADS simulation]]></title>
    <url>%2F2019%2F09%2F03%2Fvehicle-dynamics-model-in-ADS-simulation%2F</url>
    <content type="text"><![CDATA[VD backgroundusually to simualate vehicle dynamic(VD) system, either by physical model, e.g. pysical models of engine, gearbox, powertrain; or parameter model, which doesn’t take into the physical process of the dynamic system, but tracking the system’s input, output and fit their relationship with polynomail equation or functions with multi-pieces. traction force is derived from engine torque, which goes to gearbox(powertrain system) and then divided by radius of wheels, then distribute to wheels. traction torque air drag air lift force traction force $$ F(traction) = F(air drag) + F(air lift force) + F(tire drag) + acc * mass $$ in a detail way, the equation above should split into lateral state equation and longitudional state equation, if consider driver control module, which will give laterl control equation and longitudional control equation. brake torque and ABS systemABS(anti-block system) works in the situation, when driver input brake torque is larger than the max ground-tire torque can attached between tire and ground. once max ground-tire torque is achieved, namely the max fore-aft force T is achived, the traction direction traction slip angular decceleration will leap, this is the dead-blocking situation, and when it happens, the driver input brake torque is saturated. to avoid block situation happens, usually track decelleration of traction slip angular during brake torque increase, if the value of decelleration of slip traction angular is beyond a threshold value, ABS controller will trigger to decease brake torque. drive stabilitythe driving stability is mainly due to forces on tires, sepcially the lateral angular velocity derived from Lateral Force lateral controltaking driver as one element, the driveing system is a close-loop control system. the system works on a road situation: the driver pre-expect a driving path(predefined path) and operate the steering wheel to some certain angular the vehicle take a move with a real driving path(real path) the real path is not exactly fitted to the predefined path, leading the driver take an additional conpensation control longitudinal controlsimilar as lateral control VD in Unityany vehicle in Unity is a combination of: 4 wheels colliders and 1 car collider. WheelConllider1) AxleInfo AxleInfo represents the pair of wheels, so for 4-wheel vehicle, there are two AxleInfo objects. 12345678910struct AxleInfo &#123; WheelCollider left ; WheelCollider right; GameObject leftVisual ; GameObject rightVisual ; bool motor ; #enable movement of this wheel pair bool steering ; # enable rotation of this wheel pair float brakeBias = 0.5f; &#125; 2) core parameters wheel damping rate suspension distance Force apply point distance (where ground force act on wheel) suspension spring forwardSlip(slip angle), tire slip in the rolling(tractional) direction, which is used in calculating torque sidewaySlip, the lateral direction slip, which leads to stability issue. 3) visualization the WheelCollider GameObject is always fixed relative to the vehicle body, usually need to setup another visual GameObject to represent turn and roll. implementation from lg-sim 1234567891011void ApplyLocalPositionToVisuals(WheelCollider collider, GameObject visual) &#123; Transform visualWheel = visual.transform; Vector3 position; Quaternion rotation; collider.GetWorldPose(out position, out rotation); visualWheel.transform.position = position; visualWheel.transform.rotation = rotation; &#125; 4) WheelCollider.ConfigureVehicleSubsteps 1public void ConfigureVehicleSubsteps(float speedThreshold, int stepsBelowThreshold, int stepsAboveThreshold); Every time a fixed update happens, the vehicle simulation splits this fixed delta time into smaller sub-steps and calculates suspension and tire forces per each smaller delta. Then, it would sum up all resulting forces and torques, integrate them, and apply to the vehicle’s body. 5) WheelCollider.GetGroundHitreturn the ground collision data for the wheel, namely WheelHit wheel friction curvefor wheels’ forward(rolling) direction and sideways direction, first need to determine how much the tire is slipping, which is based on speed difference between the tire’s rubber and the road,then this slip is used to find out the tire force exerted on the contact point the wheel friction curve taks a measure of tire slip as an Input and give a force as output. The property of real tires is that for low slip they can exert high forces, since the rubber compensates for the slip by stretching. Later when the slip gets really high, the forces are reduced as the tire starts to slide or spin 1) AnimationCurveunity official store a collection of Keyframes that can be evaluated over time vehicleController() in lg-sim1) the controllable parameters: 123456currentGear currentRPMcurrentSpeed currentTorquecurrentInput steerInput 2) math interpolate function used Mathf.Lerp(a, b, t) a -&gt; the start value b -&gt; the end value t -&gt; the interpolation value between start and end 3) fixedUpdate() 1234567891011121314// cal trace force by rigidbodyrigidbody.AddForce(air_drag)rigidbody.AddForce(air_lift)rigidbody.AddForceAtPosition(tire_drag, act_position)// update current driving torquecurrentTorque = rpmCurve.Evalue(currentRPM / maxRPM) * gearRaion * AdjustedMaxTorque // apply torque // apply traction control// update speedcurrentSpeed = rigidbody.velocity.magnitude// update fuel infofuelLevel -= deltaConsumption// update engine temp// update turn signal light 4) ApplyTorque() 123456float torquePerWheel = accelInput * (currentTorque / numberofWheels) foreach(axle in axles): if(axle.left.motor): axle.left.motorTorque = torquePerWheel if(axle.right.motor): axle.right.motorTorque = torquePerWheel 5) TractionControl() 123456789101112TractionControl()&#123; AdjustTractionControlTorque(axle.hitLeft.forwardSlip)&#125; AdjustTractionControlTorque(forwardSlip)&#123; if(forwardSlip &gt; SlipLimit) tractionMaxTorque -= 10 else tractionMaxTorque += 10 &#125; in lg-sim, the VD model is still simple, as there is only traction/logitudional control. referadd equation in markdown wheelcollider doc whell collider official whellcollider tutorial]]></content>
      <tags>
        <tag>lgsvl</tag>
        <tag>vehicle dynamics</tag>
      </tags>
  </entry>
  <entry>
    <title><![CDATA[未来5年在哪里-9]]></title>
    <url>%2F2019%2F08%2F31%2F%E6%9C%AA%E6%9D%A55%E5%B9%B4%E5%9C%A8%E5%93%AA%E9%87%8C-9%2F</url>
    <content type="text"><![CDATA[低速发展和高附加值昨天看到一篇比较中欧(美)国家差异的帖子：北欧国家是如何保持高福利的。非常有意思的一个概念：高附加值，低发展速度的行业。 这些国家有一些传统强势行业。很有意思的是，印象中的传统行业，比如农业、畜牧业、纺织、手工业等等，跟高科技、现代化很远，似乎是非洲国家的土著人干的活儿。相比，在国内，互联网、技术创业、地产、金融、mba营销、国际贸易等才是时尚。 早年在农业大学，对以色列强大的现代农业是有所耳闻的。只是没有这种洞察力，现代化的传统行业与国民福利之间还有这样的隐形关系。 相比，美国的农业、畜牧业等传统行业很不错；美国也引领科技、金融这些时尚的行业。在美国生活的人民确是有了进退的余地。向左，可以退居大农村；向右，可以进军纽约、硅谷当金融、科技新贵。 国家的选择看似主动，也是现实的驱动。国家可以集中发力在互联网、金融、创业等方面做的很有声有色，但与此同时，传统的制造业，典型低发展速度，需要长期积累，国家发力也解不了。比如，汽车发动机、精密仪器、芯片制造等。 制造业在国内的标签是低附加值，资本不愿意进入，国家队发力也没气色。然而经过了这些长期积累的国家，比如欧洲小国瑞士、德国，制造业出口完全是高端制造的典范。而且马太效应显著，让这些欧洲国家不需要社会剧烈的经济变动，不需要老百姓担心行业变动、反倒一年还有20天带薪假。政府和人民都其乐融融，有时候看北欧人的生活，建筑、厨房、国家公园、办公环境、甚至监狱，简直像在童话里；报道一个小小的新闻，似乎都要惊动整个镇子上的人一样，这里简直就生活在像伊甸园。 而中国总是一片火热的场景，5年大力发展基建、5年大力吹捧全国创业、又5年搞工业互联网。国家没有消停，老百姓有人欢喜有人愁。 中国没有先发优势，在这些传统行业缺少长期积淀，强行进入既不不讨资本喜欢，也不讨人喜欢。比如报道，西安火箭研究员工资20万；大国重器的老焊工，无数荣誉奖章，一个月才一万多，在北京买不起房；国家队出动安利中小企业贷款，但银行就是宁愿带给亏损的国企，也不给需要钱的中小企业。 对于这些需要长期资本补血，短期回报低，才能形成壁垒的传统行业。在这个资本、市场竞争相对成熟的年代，确实比100年前，搞种植、搞养殖，或西门子、博世等制造业刚出道积累行业(非金钱资本)“资本”要困难的多。似乎印证了上周接触深圳汽车电子行业的感触，一个词：活着。根本谈不上行业积累。 可以说，北欧国家以及北美国家，因为上几代人的积累，给他们留下的行业遗产、社会制度遗产，才允许他们活的这么滋润。相比，中国没有上几代人的遗产，又进入了成熟资本市场的21世纪，所以，是压力也是机遇。去发掘新世纪的矿吧。 对这个大环境的从业者而言，选择农业、传统制造业，也确实不聪明。干着比人累的活儿，挣得比人少，社会还不待见。年轻人选择行业要识大局。相比，这个时代热的东西，确实该多关注。存在即合理。 工程师的心态国内有些企业家喜欢讲情怀，吹“工匠精神”, 结果把自己作死了。而华为这样的企业，可以拿高薪，也是加班拿命换的。能够心安理得的当一名工程师，不需要为生计、价值实现、社会待见操心的，只会在这些“低速、高附加值”的欧美公司。一个国家的企业能进化到这种形态，基本是社会稳定，企业稳定，developed，达到了一个最佳平衡点，再投资也不会增加收益，每个岗位都相对稳定，组织内部管理效能达到最优，系统的规范化远超过个人能力，所以每个员工心安理得，做好自己份内的事就好。剩下的时间，享受生活，或者出于兴趣爱好，搞点车库创业都不在话下。 相比，国内的市场还在巨变当中，没有一家企业已经进化到developed，企业组织系统还不成熟，个人能力还能发挥显著价值，那老老实实当工程师的，在还在发生资源重组系统里，就会被挤到最底层。所以，不要怪国内的年轻人心态不正。因为这个时代，传统技工不会长在国内，国内应该培养属于这个时代的阶层和行业。想跟欧美，比工匠精神，确实是拿短处跟人家的长处比。]]></content>
  </entry>
  <entry>
    <title><![CDATA[reactjs introduction]]></title>
    <url>%2F2019%2F08%2F30%2Freactjs-introduction%2F</url>
    <content type="text"><![CDATA[React is the front-end framework used in lg-sim WebUI(version 2019.07). as well for Cruise webviz, Uber visualization, Apollo, they all choose a web UI design, there is something great about web framework. what is Reactused to build complex UI from small and isolated pieces of code “components” 1234567class vehicleManager extends React.Component &#123; render() &#123; return &lt;html&gt;&lt;/html&gt; &#125; &#125; React will render the html on screen, and any changes in data will update and rerender. render() returns a description(React element) of what you want to see on the screen, React takes the descriptions and displays the result. build a react hello-world app12345npx create-react-app react-democd react-demonpm start index.jsindex.js is the traditional and actual entry point for all Node apps. in React, it tells what to render and where to render. componentscomponents works as a func, and props is the func’s paramters, the func will return a React element which then rendered in view components can be either class, derived from React.Component, which then has this.state and this.setState() ; or a function, which need use Hook to keep its state. the design advantage of components is obvious: to make components/modules reuseable. usually in route.js will define the logic switch to each component based on the HTTP request. setStatewhenever this.setState() takes an object or a function as its parameter, update it, and React rerender this component. when need to change a component state, setState() is the right way, rather than to use this.state = xx, which won’t register in React. Hookexample from stateHoook 12345678910111213141516import &#123; useState &#125; from 'react';function Example() &#123; // Declare a new state variable, which we'll call "count" const [count, setCount] = useState(0); return ( &lt;div&gt; &lt;p&gt;You clicked &#123;count&#125; times&lt;/p&gt; &lt;button onClick=&#123;() =&gt; setCount(count + 1)&#125;&gt; Click me &lt;/button&gt; &lt;/div&gt; );&#125; Hook is a speicial feature, used to share sth with React function. e.g. useState is a way to keep React state in function, which by default has no this.state in Hook way, even afer function() is executed, the function’s variable is not clear, but keep until next render. so basically Hook give a way to make function stateable. intial hook the only parameter pass in hook::useState(new Map()) is the initial state. useState(initState) return from useState it returns a pair [currentState, setStatefunc], e.g. [maps, setMaps], setStatefunc here is similar as this.setState in class component access state directly {currentState} update state {() =&gt; setStatefunc(currentState)} Context context gives the way to access data, not in the strict hierachy way. the top is the context Provider, and the is the context consumer, there can be multi consumers. createContextas in react context official doc 1const Context = React.createContext(defaultValue); during render, the component which subscribe this context will get the context content from its context provider and put in rendering, only when no avialable provider is found, defaultValue is used. 1&lt;Context.Provider value=&#123;/xx/&#125; &gt; whenever the value in Provider changes, all consumers will rerender. EventSourceEventSource is HTTP based one-way communication from server to client, which is lighter than Websocket eventsource vs websocket, also the message type is only txt in EventSource, while in Websocekt it can be either txt or binary stream. by default, when client received a [“/event”] message, will trigger onMessage(). but EveentSource allow to define user-speicial event type, e.g. VehicleDownload, so in client need to a new listener: 1myEventSource.addEventListener('VehicleDownload', (e)=&gt;handleVehicleEvents(e)) react routejs specialfirst class object a function is an instance of the Object type a function can have properties and has a link back to its constructor method can store a function in a variable pass a function as a parameter to another function return a function from another function promisearrow function]]></content>
      <tags>
        <tag>react</tag>
      </tags>
  </entry>
  <entry>
    <title><![CDATA[nancyfx study]]></title>
    <url>%2F2019%2F08%2F30%2Fnancyfx-study%2F</url>
    <content type="text"><![CDATA[document, Nancy is used in lg-sim webUI 2019.07 version, pretty new staff for me. IntroductionNancy is a lightweight, low-ceremony framework for building HTTP based services on .NET and Mono. Nancy is designed to handle DELETE, GET, HEAD, OPTIONS, POST, PUT and PATCH request and provides a simple, elegant Domain Specific Language(DSL) for returning a response. build to run anywhereNancy was designed to not have any dependenceis on existing frameworks, it’s used pretty much wherever you want to. host in Nancy acts as an adaptor for a hosting environment, thus enabling Nancy to run on existing techs, such as ASP.NET, WCF. the bare minimum requires to build a Nancy service are the core framework and a host. helloworld serviceall module should be derived from NancyModule, and define a route handler. tips: always make the module public, so NancyFx can discover it. 12345678public class HelloWorld : NancyModule &#123; public HelloModule() &#123; Get[&quot;/&quot;] = parameters =&gt; &quot;Hello World&quot; ; &#125;&#125; exploring modulesmodule is where you define the logic, is the minimum requirement for any Nancy app. module should inherit from NancyModule, then define the behaviors, with routes and actions. modules are globally discoveredthe global discovery of modules will perform once and the information is then cached, so not expensive. Define routesto define a Route need to specify a Method + Pattern + Action + (optional) Condition 123456789public class VehicleModule : NancyModule&#123; public VehicleModule() &#123; Get[&quot;/vehicle&quot;] = _ =&gt; &#123; // do sth &#125;; &#125; &#125; or async run: 123456789public class VehicleModule : NancyModule&#123; public VehicleModule() &#123; Get[&quot;/vehicle&quot;, runAsnyc: true ] = async(_, token) =&gt; &#123; // do sth long and tedious &#125;; &#125; &#125; Methodmethod is the HTTP method used to access the resource, Nancy support: DELETE, GET, HEAD, OPTIONS, POST, PUT and PATCH. secret for selecting the right route to invokein case when two routes capture the same request, remember : the order in which modules are loaded are non-deterministic routes in a given module are discovered in the order in which they are defined if several possible matches found, the most specific match. root pathall pathes used in Nancy are relative to root path, which tell Nancy where its resources are stored on the file system, which is defined in IRootPathProvider static contentstatic content is things e.g. javascript files, css, images etc. Nancy uses a convention based approach to figure out what static content it is able to serve at runtime. Nancy supports the notion of having multiple conventions for static content and each convention is represented by a delegate with the signature Func&lt;NancyContext, string, Response&gt; the delegate accepts two parameters: the context of the current request and the application root path, the output of the delegate is a standard Nancy Response object or null, which means the convention has no static content. define your own conventions usign the bootstrapperlink View enginesview engine, takes a template and an optional model(the data) and outputs(usually) HTML to be rendered into the browser. in lg-sim, the view is rendered in nodejs. MVCmodel view controller understand controller a controller is reponsible for controlling the flow logic in the app. e.g. what reponse to send back to a user when a user makes a browser request. any public method in a controller is exposed as a controller action. understand view a view contains the HTML markup and content that is send to the browser. in general to return a view for a controller action, need to create a subfolder in the Views folder with the same name as the controller. understand model the model is anything not inside a controller or a view. e.g. validation logic, database access. the view should only contain logic related to generating the user interface. the controller should only contain the bare minimum of logic required to return the right view. C# anonymousc# programming guide 12345(input-parameters) =&gt; expression (input-parameters) =&gt; &#123;&lt;sequence-of-statements&gt;&#125;TestDelegate testD = (x) =&gt; &#123;(Console.WriteLine(x);&#125;; =&gt; is the Lambda operator, on its left is input paramters(if exist). Lambda expression/sequence is equal to a delegate class. delegate is a refer type, used to pass one function as paramter to another function. e.g. in event handle. Nancy in lg-sim WebUI/Assets/Scripts/Web a few steps to build Nancy server: Nancyhost.start() add NancyModule instance(where define route logic) a few other libs used : PetaPoco, a light-weight ORM(object relational mapper) framework in C# SQLite refermeet nancy nancy doc in chinese `]]></content>
      <tags>
        <tag>C#</tag>
        <tag>http</tag>
      </tags>
  </entry>
  <entry>
    <title><![CDATA[未来5年在哪里 (8).md]]></title>
    <url>%2F2019%2F08%2F24%2Fwhere-are-you-in-next-5-years-9%2F</url>
    <content type="text"><![CDATA[这一系列的思考来源两个事件：深圳汽车电子行业代表来司交流；与全球机器人大赛参展。 深圳的中小企业 vs 资本大户布谷鸟 智能驾驶座舱解决方案供应商。主要产品，车载显示屏，仪表屏和基于Android操作系统的上层应用。展示完，第一感受就是深圳华强北来了。 当年（2007年)，中国风靡深圳山赛手机。甚至多年以后，山赛智能手机成了中国发展的一个典型案例，为众人津津乐道： 眼看他起高楼，眼看他宴宾客，眼看他楼塌了。 这个产业最初蓬勃，但凡投资进去的人，都躺着赚钱，只是没能力在蓬勃发展期间为转型蓄力。后来政策、市场、甚至原材料的任何风吹草动，都足以摧毁它。 这个产业的问题放大了看，就是 华为/阿里 与 中兴/联想 的对比。国内的制造业小企业，最初都是代理、组装、贴标签、山赛货。如果碰到市场爆发，春风得意，赚的盆满钵满。但是，后续如何发展，创始人的vision就立判高下了。 一个选择是立足长远利益，未雨绸缪，生于有患。在打开了市场之后，立马投入产品和相关技术、供应链、品牌积累。可以接受眼前的低资金回报，选择了艰辛但持久的终成就行业头部的道路。 一个选择是只看赚钱。自己做产品积累，回钱周期太久，根本不考虑。资本趋利，大不了赚到钱，再转战下一片红海。这是势利的商人思路，他们对某一个具体行业不带任何感情，冷血。任何行业只是一个资本生长的土壤，不管是做手机、地产、保险、造车等等，都不是出于热爱，而是资本逐利。比如，宝能收购万科，就是冷血的资本挑战性情的行业中人。 但凡对行业还有所爱的，都难以忍受被资方强奸。所以，恒大站出来说要造车，我的第一反应，就是恒大要来强奸造车人。恒大没有对汽车的感情，只不过跟进资本进军汽车领域。资本原本是公司的生产要素之一，但是资本又最不具有公司界限约束，任何外部资本都能挑战公司自身。所以，如何让资本服务于行业公司？ 制造业创业回到布谷鸟，号称做车载计算平台，不自己做计算芯片，不自己做车载屏幕，不自己做车载定制Android操作系统，顶多开发几个上层app。这不又是一家华强北组装厂吗。创业还靠廉价组装竞争？相比，北上的创业公司，诸如硬件地平线、软件旷视等算是技术/境界高多了。 也不得不提本司。背靠大树，天然壁垒，但实际做的还是华强北的活儿。不同之处是，资本压力小，对技术储备有规划但没有环境和姿态的转变。会聘一两个外国人、一两个教授撑面儿，但这几位基本处于边缘，决策、项目规划都不考虑他们。估计其他家，也好不了哪里去。要不然，市场会有反馈的。 这样的团队/小公司里面，对想要拔高产品/技术见识的年轻人其实很艰难。因为打开市场，销售是关乎存活的，相比产品是自研的，还是贴牌的，有没有底层开发/设计积累都是次要的。 要去这种小团队做产品/技术，就需要能单挑担子的产品人/技术大牛。当然，付出与回报怎么权衡。给ceo待遇，似乎可以考虑。 传统制造业大公司里面的小团队，虽然没有自己去开辟市场的压力，大不了内销，但同时带来的问题，是积重的公司元老，对职业经理人，工程师文化，都是严重的挑战。 公司元老经常会看到，职业mba人在中国企业水土不服，或者回国的硅谷工程师对中国企业文化的抱怨。虽然，宏观数据国内企业似乎都很国际化了，底层的/微观的现代公司管理/制度上，却不是一代就能跟上国际的。比如，mbaer到中国公司，很难战胜资源/权利在握的元老们，按照现代管理办法去调整公司，那基本上就等着被边缘化，然后公司不养闲人，下一步就是踢出去。公司又回到原来的管理模式。老人们再次证明自己不瞎折腾，是公司的积重，继续升官发财。 工程师文化，不管是如google等互联网公司，还是如ford等汽车制造业公司，都比较明显。hr, manager, 甚至leader团队把自己归为工程师团队提供服务的，在衣食住行、工作环境、设备、交流、培训等等都照顾的很好。 虽然北美是更成熟的资本社会，他们的企业反而不那么见钱就撒网，没见过ford投资移动互联网，google要去收购造车厂，或者发展文化传媒业务的。他们的业务更倾向纵深，专注，在一个领域深耕，然后成为行业头部。这可能也是成熟资本市场的现代化公司该有的样子。 相比，国内的企业太草莽。国内的工程师只有听命于人。技术积累对绝大多数企业，都是可以谈谈但不会认真做的。这样的情况，当然长远是没前途的。一波政策、市场的红利之后，基本作死。 人口基数国内也有很多有情怀的创业人或中小企业主，他们也许并不甘与政策红利，也想认真做产品。但是，在这个社会没办法。 美国的创业环境，想想apple, google, facebook，cruise等等创业经历。首先他们自己没有生存压力，出于热爱开始的；创业阶段，市场/政策对他们的包容度很好，允许决策失败；然后，资本市场、专业人才的补充都有成熟的供应体系。 国内，一方面有廉价的人口红利，但对大部分创业公司，规模经济反而很难真正成为其产品的利好因素，除了资本玩家，比如，摩拜单车、瑞幸咖啡、yy连锁酒店，滴滴出行等等，这些玩家，根本不是在拼产品，而是资本。谁占有更多资本，谁就能笑到最后。 这样的创业环境，不能培养社会范围内更好的创业土壤，反而破坏了创业各方面的供应系统，包括创始人的初衷，专业人才队伍，资本法务系统建立等等。归结一个词：浮躁。 人口基数大，在政治家、资本家眼里是好词，但是对个体，就意味着就业竞争、服务质量差、人际关系不温存。 经历过大公司的年轻人在三四线小城市，有很多生活无忧的年轻人。他们大学毕业后，在大城市瞎晃了一年半载，找不到合适的企业，就打道回府了。家里有条件，在当地慢慢都会活的滋润。我是觉得，经历下一个现代化管理的大公司，也是个不错的体验。至少知道人类社会，最优秀的组织体系是怎么运作的。 当然，没有上升，一辈子在大公司的系统里打工，就有点无趣。不如回家悠哉悠哉。 世界机器人大会2019整体感受，小企业生存多艰。]]></content>
  </entry>
  <entry>
    <title><![CDATA[real-world env in ADS simulation]]></title>
    <url>%2F2019%2F08%2F20%2Freal-world-env-in-ADS-simulation%2F</url>
    <content type="text"><![CDATA[this topic is based on lg-sim, the advantage is plenty Unity3D resources. to drive planning design, simulation has to keep as real-world env as possible. for L3 and below, the high-fidelity is actually most about HD map, since in L3 ADS, sensors only cares about the road parternors on road, and planning module take hd map as input. so there is a need to build simulation envs based on hd map. osmin the Unity engine, all in the env(e.g. traffic light, road lanes, lane mark etc) are GameObjects. to describe a map info, one of most common used map format is OSM (another is opendrive, but no open parser in unity yet), however which is not born to used in hd map, but still is very flexible to extend to descripe all info, a hd map requires. e.g. lane id, lane width, lane mark type e.t.c the open-source tool OsmImporter is good enough to parse osm meta data into Unity GameObjects. for different map vendors, first to transfer their map format into the extended-osm format. and then can generate all map info related gameObjects in Unity env. that’s the main idea to create real-world env in simualtor. the end is to make a toolchain from map vendor input to Unity 3D env. waypointsin either lg-sim or carla, the npc vehicles in self-driving mode, is actually controlled by following waypoints in the simulator, and waypoints is generated during creating map in step above. carla has plenty client APIs to manage the waypoints, and then drive npcs. hd-map toolby default, both lg-sim and carla has a tool to create hd-map, that’s basically mark waypoints on the road in an existing map, which is not strong. carla later support Vector one to build in map more efficiently. L3+ virtual env generatorthere are plenty teams working on building/rendering virtual envs directly from sensor data, e.g Lidar cloud point or camera image, and plenty image-AI techs here, which of course gives better immersed experince. and the test task is mostly for perception, data-fusion modules, which is heavier in L3+]]></content>
      <tags>
        <tag>lgsvl</tag>
        <tag>simulation</tag>
      </tags>
  </entry>
  <entry>
    <title><![CDATA[threading and websockets]]></title>
    <url>%2F2019%2F08%2F16%2Fthreading-and-websockets%2F</url>
    <content type="text"><![CDATA[threading.Thread123456789101112run()start()join([time])isAlive()getName()setName() to initialize a thread with: threadID, name, counter start() and run()start() once for each thread, run() will not spawn a separate thread, but run in current thread 1234567891011121314class myThread(threading.Thread): def __init__(self, *args, **kwargs): super(myThread, self).__init__(*args, **kwargs) def run(self): print("run ... from start()")if __name__ == "__main__": demo = myThread() demo.start() demo.join() lock objectsa primitive lock does not belongto a certain thread when locked. by default, when constructed, the lock is in unlocked state. acquire(), will set the lock to locked and return the lock immediately(atom operator); if current lock is locked, then acquire() blocks (the thread) untill the occuping-lock thread calls release(). if multi-thread blocked by acquire(), only one thread will get the lock when release() is called, but can’t sure which one from the suspended threads condition objects12345678910threading.Condition(lock=None)acquire()wait()notify()release() thread A aquire() the condition variable, to check if condition satification, if not, thread A wait; if satisfied, update the condition variable status and notify all other threads in waiting. websocketbackgroundwebocket is better than http when the server need actively push message to client, rather than waiting client request first. clientusually webSocket client has two methods: send() to send data to server; close() to close websocket connection, as well a few event callbacks: onopen(): triggered when websocket instance get connected onerror(), onclose(), onmessage(), which triggered when received data from server. when constructing a webSocket instance, a connection is built between server and client. 1234567891011121314151617// client connect to server var ws = new WebSocket('ws://localhost:8181');ws.onopen = function()&#123; ws.send("hello server");&#125;// if need run multi callbackws.addEventListener('open', function(event) &#123; ws.send('hello server'); &#125;); ws.onmessage = function(event)&#123; var data = event.data ;&#125;;ws.addEventListener("message", function(event)&#123; var data = event.data ;&#125;); ws.send('client message'); onmessage()whenever the server send data, onmessage() events get fired. server onOpen(), triggered when a new client-server connection setup onMessage(), triggered when message received onClose(), onError(), implementationin c#, there is websocket-sharp, in python is python-websockets, in js is nodejs-websocket. as well as websocket protocol is satisified, server can write using websocket-sharp, and client in python-websockets. an example is in lg-simulator. the message type in websocket can be json or binary, so there should be json parse in c#(SimpleJSON), python(json module) and js (JSON). refersending message to a client sent message from server to client python websockets nodjs websocket c# websocket]]></content>
      <tags>
        <tag>network</tag>
      </tags>
  </entry>
  <entry>
    <title><![CDATA[Unet intro]]></title>
    <url>%2F2019%2F08%2F16%2FUnet-intro%2F</url>
    <content type="text"><![CDATA[unity3d manual High Level API(HLAPI) HLAPI is a server authoritative system, triggered from UnityEngine.Networking authority host has the authority over all non-player GameObjects; Player GameObjects are a special case and treated as having “local authority”. local/client authority for npcmethod1: spawn the npc using NetworkServer.SpawnWithClientAuthoritymethod2: NetworkIdentity.AssignClientAuthority network context propertiesisServer()isClient()sLOcalPlayer()hasAuthority() networked GameObjectsmultiplayer games typically built using Scenes that contain a mix of networked GOs and regular GOs. networked GOs needs t obe synchronized across all users; non-networked GOs are either static obstacles or GOs don&apos;t need to synchronized across players networked GO is one which has a NetworkIdentiy component. beyond that, you need define what to syncronize. e.g. transform ,variables .. player GO NetworkBehavior class has a property: isLocalPlayer, each client player GO.isLocalPlayer == true, and invoke OnStartLOcalPlayer() Player GOs represent the player on the server, and has the ability to run comands from the player’s client. spawning GOsthe Network Manager can only spawn and synchronize GOs from registered prefabs, and these prefabs must have a NetworkIdentity component spawning GOs with client authorityNetworkServer.SpawnWithClientAuthority(go, NetworkConnection),for these objects, hasAuthority is true on this client and OnStartAuthority() is called on this client. Spawned with client authority must have LocalPlayerAuthority set to NetworkIdentity, state synchronization[SyncVars] synchronzed from server to client; if opposite, use [Commands] the state of SyncVars is applied to GO on clients before OnStartClient() called. engine and editor integration NetworkIdentity component for networked objects NetworkBehaviour for networked scripts configurable automatic synchronization of object transforms automatic snyc var build-in Internet servicesnetwork visibilityrelates to whether data should or not sent about the GOs to a particulr clinet. method1: add Network Proximity Checker component to networked GOmethod2: Scene GOs saved as part of a Scene, no runtime spawn actions and communicationmethod1: remote actions, call a method across networkmethod2: networking manager/behavior callbacksmethod3: LL network messages host migrationhost: a player whose game is acting as a server and a “local client”remote client: all other playersso when the host left, host need migrate to one remote client to keep the game alive how it works: enable host migration. so Unity will distribute the address of all peers to other peers. so when host left, one peer was selected to be the new host (heart-keeping) network discoveryallow players to find each other on a local area network(LAN) need component , in server mode, the Network Discovery sends broadcast message over the network o nthe specified port in Inspector; in client mode, the component listens for broadcast message on the specified port using transport Layer API (LL API)socket-based networking in the endnetwork is basic cornerstone in server-client applications, Unet is deprecated at this moment, but still many good network resource can take in charge. e.g. websockets, nodejs, and what behind these network protocol or languages, e.g. async/coroutines are charming as well.]]></content>
      <tags>
        <tag>unity3D</tag>
      </tags>
  </entry>
  <entry>
    <title><![CDATA[python generator and asynico]]></title>
    <url>%2F2019%2F08%2F16%2Fpython-generator-and-asynico%2F</url>
    <content type="text"><![CDATA[python iteratorin python, iterator is any object that follows iterator protocol. which means should include method: __iter()__, next(), if no next element should raise StopIteration exception. take an example, dict and list have implemented __iter()__ and __getitem__() methods, but not implemented next() method. so they are iterable but not iterator. generatorto support delay operations, only return result when need, but not immediately return generator funcion (with yield) generator expression Python generator in other languages is called coroutines yieldyield is almost of return, but it pauses every step after executing the line with yield, and till call next() will continue from the next line of yield. 12345678while True: sim.run(timeout, cb)def cb(): a = 1 yield print(a) a += 1 coroutinesexample: 12345678def cor1(name): print("start cor1..name..", name) x = yield name print("send value", x)cor_ = cor1("zj")print("next return", next(cor_))print("send return", cor_.send(6)) to run coroutine, need first call next(), then send() is called. namely, if not call next() first, send() will wait, and never be called. the thing is, when define a python generator/coroutine, it never will run; only through await(), next() call first, which then trigger the generator/coroutine start. asyncioasyncronized IO, event-driven coroutine, so users can add async/await to time-consuming IO. event-loop event loop will always run, track events and enqueue them, when idle dequeue event and call event-handling() to deal with it. asyncio.Task awaitawait only decorate async/coroutine, which is a waitable object, it works to hold the current coroutine(async func A) and wait the other coroutine(async func B) to the end. 123456789async def funcB(): return 1async def funcA(): result = await funcB() return resultrun(funcA()) multi-task coroutines1234567891011121314151617181920212223242526272829303132333435363738loop.create_task()run_until_complete() #block the thread untill coroutine completedasyncio.sleep() #nonblock event-loop, with `await`, will return the control-priority to event-loop, at the end of sleep, control-priority will back to this coroutine asyncio.wait(), #nonblock event-loop, immeditialy return coroutine object, and add this corutine object to event-loop``` the following example: get the event-loop thread, add coroutine objct(task) in this event-loop, execute task till end.```pythonimport asyncioasync def cor1(name): print("executing: ", name) await asyncio.sleep(1) print("executed: ", name)loop = asyncio.get_event_loop()tasks = [cor1("zj_" + str(i)) for i in range(3)]wait_cor = asyncio.wait(tasks)loop.run_until_complete(wait_cor)loop.close()``` ### dynamically add coroutine```shell loop.call_soon_threadsafe() # add coroutines sequencially asyncio.run_coroutine_threadsafe() #add coroutines async add coroutine sequenciallyin this sample, main_thread(_loop) will sequencely run from begining to end, during running, there are two coroutines registered, when thread-safe, these two coroutines will be executed. the whole process actually looks like run in sequencialy 123456789101112131415161718192021222324252627import asynciofrom threading import Threaddef start_loop(loop): asyncio.set_event_loop(loop) loop.run_forever()def thread_(name): print("executing name:", name) return "return nam:" + name_loop = asyncio.new_event_loop()t = Thread(target=start_loop, args=(_loop,)) #is a thread or coroutine ?t.start()handle = _loop.call_soon_threadsafe(thread_, "zj")handle.cancel()_loop.call_soon_threadsafe(thread_, "zj2")print("main thread non blocking...")_loop.call_soon_threadsafe(thread_, "zj3")print("main thread on going...") add coroutines asyncin this way, add/register async coroutine objects to the event-loop and execute the coroutines when thead-safe 1234567891011future = asyncio.run_coroutine_threadsafe(thread_("zj"), _loop)print(future.result())asyncio.run_coroutine_threadsafe(thread_("zj2"), _loop)print("main thread non blocking...")asyncio.run_coroutine_threadsafe(thread_("zj3"), _loop)print("main thread on going...") async callback vs coroutinescompare callback and coroutines is hot topic in networking and web-js env.]]></content>
      <tags>
        <tag>python</tag>
      </tags>
  </entry>
  <entry>
    <title><![CDATA[dns server in lan]]></title>
    <url>%2F2019%2F08%2F10%2Fdns-server-in-lan%2F</url>
    <content type="text"><![CDATA[DNS server in LANto use domain name(e.g. gitlab.com) in LAN rather than IP, it needs every local host machine to store all key-values:: host-IP. if the LAN has many host machines, it will be difficult to maintain. Setting up DNS server will help to automatically map the ip to domain or reverse in the LAN. bind912345678910111213141516171819202122232425262728293031323334353637383940apt-get install bind9 ``` ### /etc/bind/named.conf.local```shellzone "gitlab.com" &#123; type: master; file "/etc/bind/db.ip2gitlab.com" ;&#125;; zone "101.20.10.in-addr.arpa" &#123; type: master; file "/etc/bind/db.gitlab2ip.com" ;&#125;;``` ### /etc/bind/db.gitlab2ip.com [dns zone file format](https://help.dyn.com/how-to-format-a-zone-file/)gitlab2ip zone file is mapping from domain to ip, as following sample, it works like: www.$ORIGIN --&gt; 10.20.101.119 ```shell; command$TTL 6000;@ refer to current zone file; DNS-server-FDNQ notification-email$ORIGIN gitlab.com@ IN SOA server email ( 2 ; 1d ; 1h ; 5min ; )@ IN NS serverwww IN A 10.20.101.119server IN A 10.20.101.119 /etc/bind/db.ip2gitlab.comip2gitlab zone file is from ip to domain mapping, 1234567891011$TTL 6000$ORIGIN 101.20.10.in-addr.arpa@ IN SOA server. email. ( 2 ; 1d ; 1h ; 5min ; )@ IN NS server119 IN A www.gitlab.com119 IN A server.gitlab.com nslookupnslookup www.gitlab.com #dns forward (domain 2 ip) nslookup 10.20.101.119 #reverse (ip 2 domain) settingsif the DNS setted above(DNS-git) is the only DNS server in the LAN, then this DNS works like a gateway, to communicate by domain name, every local host talk to it first, to understand the domain name. but in a large size company LAN newtwork, there may already has a DNS server hosted at IT department (DNS-IT), with a fixed IP e.g. 10.10.101.101, and all localhost machines actually set DNS-IT as the default DNS. DNS-git will work as a sub-DNS server. Inside the small team, either every localhost change default DNS to DNS-git, then DNS-git become the sub-network server. if every localhost still keep DNS-IT, there is no way(?) to use DNS-git service in LAN, and even make conflicts, as the DNS-git localhost machine will listen on all TCP/IP ports, every new gitlab.com access request (input as IP address) will get an output as domain name, but the others can’t understand this domain-name… what happened with two DNS server in LAN ? how email worksMail User Agent(MUA), e.g. Outlook, Foxmail, used to receive and send emails. MUA is not directly sent emails to end users, but through Mail Transfer Agent(MTA), e.g. SendMail, Postfix. an email sent out from MUA will go through one or more MTA, finally reach to Mail Delivery Agent(MDA), the email then store in some database, e.g. mailbox the receiver then use MUA to review the email in the mailbox ps, one day work as a IT admin ….]]></content>
      <tags>
        <tag>network</tag>
      </tags>
  </entry>
  <entry>
    <title><![CDATA[design scenarios in ADS simulation]]></title>
    <url>%2F2019%2F07%2F25%2Fdesign-scenarios-in-ADS-simulation%2F</url>
    <content type="text"><![CDATA[to design scenarios in self-driving test, one reference is: a framework for automated driving system testable cases and sceanrios, most other scenario classifications have almost the same elements: ODD, e.g. road types, road surfaces; OEDR, namely object and event detection and response, e.g. static obstacles, other road actors, traffic signature, environment conditions, special zones; and failure mode behaviors. in general, test cases can be grouped as black box test, in which the scenario parterners’ behavior is not pre-defined or unpredictable, e.g. random traffic flow(npcs) scenario, or white box test, where the npcs behavior is pre-defined, e.g. following user-define routings. white box testing is helpful to support performance metrics; while black-box testing is helpful to verify the system completeness and robust. as for ADS test, there are a few chanllenges coming from: heuristics decision-making algorithms, deep-learning algorithms, which is not mathematically completed the test case completeness, as the number of tests required to achieve statistically significant to claim safe would be staggering undefined conditions or assumptions a sample test scenario set maybe looks like: ODD OEDR-obj OEDR-loc maneuver in rump static obstacles in front of current lane in rump static obstacles in front of targe lane in rump dynamic obstacles in front of current lane scenarios_runnercarla has offered an scenario engine, which is helpful to define scenarios by test_criteria and behaviors, of course the timeout as well. test_criteria, is like an underline boundary for the scenario to keep, if not, the scenario failed. e.g. max_speed_limitation, collision e.t.c. these are test criterias, no matter simualtion test or physical test, that have to follow the same criterias; in old ways, we always try to find a way to evaluate the simulation result, and thought this may be very complex, but as no clue to go complex further, simple criterias actually is good. even for reinforcement learning, the simple criterias is good enough for the agent to learn the drive policy. of course, I can expect there are some expert system about test criterias. For simulation itself, there has another metric to descibe how close the simulation itself to physical world, namley to performance how well the simulator is, which is beyond here. behaivor is a good way to describe the dynamic processing in the scenario. e.g. npc follow lane to next intersection, and stop in right lane, ego follow npc till the stop area. OpenScenario has similar ideas to descibe the dynamic. to support behaivor, the simulator should have the features to control ego and npc with the atomic behaviors, e.g. lane-follow, lane-change, stop at intersection e.t.c. in lg simulator, npc has simple AI routing and lane-following API, basically is limited to follow pre-defined behaviors; ego has only cruise-control, but external planner is avialable through ROS. for both test_criteria and behavior, carla has a few existing atomic elements, which is a good idea to build up complex scenarios. OpenScenarionot sure if this is a project still on-going, carla has interface and a few other open source parsers there, but as it is a standard in popular projects, e.g. PEGUS, should be worthy to take further study.]]></content>
      <tags>
        <tag>lgsvl</tag>
        <tag>simulation</tag>
      </tags>
  </entry>
  <entry>
    <title><![CDATA[model based design sucks]]></title>
    <url>%2F2019%2F07%2F18%2Fmodel-based-design-sucks%2F</url>
    <content type="text"><![CDATA[AV development includes perception, sensor fusion, location &amp; mapping, decision-making &amp; control (or motion planning), embedded, simulation and maybe many system-glue software tools. L3 planning &amp; control is now expert-based decision system, which basically defines rules to make decision, where model based design(mbd) is a helper. Waymo mentioned their hybrid decision-making solution, basically Machine Learning(ML) will take a big part of the situations, but still space to allow rule-based solution to take priority. when consider to ML decision making, mdb will become less useful. why model-basedTraditional OEMs follow vehicle-level safety requirements(ASIL-D) to develop vehicle products and components, usually can be represented as the V style development, from user requirs, system design, implmenent to test verification. to go through the whole V process take a rather long time, e.g. for a new vehicle model, it means 2~5 years. Commercial hardware and software(mobile apps) products which has lower level safety requirements, however, can iterate in a quicker frequency. the safety requirements drive the product development in a very different way, compared to common Internet products, which include more straight-forward programming skills and software architecture mindset. but to satisfy the additional, or should say the priority safety requirements, how to organize the code is less important than how to verify the functions is to satisfy the safety. so there comes the model-based design, the most-highly feature of which is to support system test and verify at pre-product period. of course, model-based design should be easily to build up prototype and visualize the system, which is the second feature of mbd, working similar like a microsoft vision e.t.c thirdly, from design to product, is auto code generation. which means once the design is verified, you don’t need to go back to write code again, but directly generate code from the design graph. model-based design toolchain is already a whole eco-system, e.g. system design, auto code generator, test tools. and all these tools should be first verified by ASIL-D standard. Internet AI companies once thought it would be easy to take over this traditional development by Internet agile development, while the reality is they still depends on model-based design at first to verify the system, then back to implement code again, which should be more optimized than auto-generated ones, which is one drawbacks of mbd, as mbd is tool-depended, e.g. Matlab, if Matlab doesn’t support some most updated libs, then they are not in the auto-code, and most time Matlab is far behind the stable version of libs outside. what mbd can’t dombd is born to satisfy safety requirments in product development. so any non safety required product won’t use mbd. and by nature, mbd is good at turning mathematical expressions to system languages, and logical relations to state flows, so any non-articulatable system is difficult to represent in mdb languages. in vehicle product development, engine, powertrain, ECU, brake system, ADAS, L3 motion planning, e.t.c have depends heavily on mbd. but also we can predict, L3+ applications arise, with image, cloud point based object detection, data fusion, SLAM, AI-driven planning, IVI, V2X, will hybrid mbd with many Internet code style. industry experience: a metaphysicssome friends say mass-product-experience makes him more value than new birds. since industry experience is not transparent, as there is a no clear bar to test the ability/value of the enginer, unlike developers, who can valued by their product, or skills, also the same reason make these guys who stay long in the industry sounds more valued, and they have more likey went through one or many mass product experience. but at most, industry product depends e.g. vehicle, on teamwork, even the team lead can’t make it by himself, unlike developer, a top developer can make a huge difference, much valued than a team of ordinary ones.]]></content>
  </entry>
</search>